Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay...
-
Upload
luke-lucas -
Category
Documents
-
view
219 -
download
0
Transcript of Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay...
![Page 1: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/1.jpg)
Introduction to Scientific Workflows Introduction to Scientific Workflows and the KEPLER Systemand the KEPLER SystemIntroduction to Scientific Workflows Introduction to Scientific Workflows and the KEPLER Systemand the KEPLER System
Instructors:
Bertram LudaescherIlkay Altintas
Instructors:
Bertram LudaescherIlkay Altintas
![Page 2: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/2.jpg)
2Scientific Workflows, B. Ludaescher & I. Altintas
Overview
• 10:30-11:15 Introduction to Scientific Workflows
• 11:15-12:00 Scientific Workflows in KEPLER live demo, brains-on session
• … but first, one more time … (déjà déjà vu)
TM
![Page 3: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/3.jpg)
3Scientific Workflows, B. Ludaescher & I. Altintas
![Page 4: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/4.jpg)
4Scientific Workflows, B. Ludaescher & I. Altintas
Information Integration Challenges: S4 Heterogeneities• Systems Integration
– platforms, devices, data & service distribution, APIs, protocols, … Grid middleware technologies + e.g. single sign-on, platform independence, transparent use of remote
resources, …
• Syntax & Structure– heterogeneous data formats (one for each tool ...)– heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs, flat files, …) – heterogeneous schemas (one for each DB ...) Database mediation technologies+ XML-based data exchange, integrated views, transparent query rewriting,
…
• Semantics– fuzzy metadata, terminology, “hidden” semantics, implicit assumptions, … Knowledge representation & semantic mediation technologies+ “smart” data discovery & integration+ e.g. ask about X (‘mafic’); find data about Y (‘diorite’); be happy
anyways!
![Page 5: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/5.jpg)
5Scientific Workflows, B. Ludaescher & I. Altintas
Information Integration Challenges: S5 Heterogeneities
• Synthesis of applications, analysis tools, data & query components, … into “scientific workflows” – How to make use of these wonderful things & put them
together to solve a scientist’s problem? Scientific Problem Solving Environments
(PSEs)GEON Portal and Workbench (“scientist’s view”)+ ontology-enhanced data registration, discovery,
manipulation+ creation and registration of new data products from
existing ones, … GEON Scientific Workflow System (“engineer’s
view”)+ for designing, re-engineering, deploying analysis pipelines
and scientific workflows; a tool to make new tools … + e.g., creation of new datasets from existing ones, dataset
registration,…
![Page 6: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/6.jpg)
6Scientific Workflows, B. Ludaescher & I. Altintas
What is a Scientific Workflow (SWF)?• Goals:
– automate a scientist’s repetitive data management and analysis tasks
– typical phases: • data access, scheduling, generation, transformation, aggregation,
analysis, visualization design, test, share, deploy, execute, reuse, … SWFs
• Typical requirements/characteristics:– data-intensive and/or compute-intensive– plumbing-intensive– dataflow-oriented– distributed (data, processing)– user-interaction “in the middle”, …– … vs. (C-z; bg; fg)-ing (“detach” and reconnect)– advanced programming constructs (map(f), zip, takewhile, …)– logging, provenance, “registering back” (intermediate) products…
• … easy to recognize a SWF when you see one!
![Page 7: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/7.jpg)
7Scientific Workflows, B. Ludaescher & I. Altintas
Promoter Identification Workflow
Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)
![Page 8: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/8.jpg)
8Scientific Workflows, B. Ludaescher & I. Altintas
Source: NIH BIRN (Jeffrey Grethe, UCSD)Source: NIH BIRN (Jeffrey Grethe, UCSD)
![Page 9: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/9.jpg)
9Scientific Workflows, B. Ludaescher & I. Altintas
Ecology: GARP Analysis Pipeline for Invasive Species Prediction
Training sample
(d)
GARPrule set
(e)
Test sample (d)
Integrated layers
(native range) (c)
Speciespresence &
absence points(native range)
(a)EcoGridQuery
EcoGridQuery
LayerIntegration
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Validation
MapGeneration
Integrated layers (invasion area) (c)
Species presence &absence points
(invasion area) (a)
Native range
predictionmap (f)
Model qualityparameter (g)
Environmental layers (native
range) (b)
GenerateMetadata
ArchiveTo Ecogrid
RegisteredEcogrid
Database
RegisteredEcogrid
Database
RegisteredEcogrid
Database
RegisteredEcogrid
Database
Environmental layers (invasion
area) (b)
Invasionarea prediction
map (f)
Model qualityparameter (g)
Selectedpredictionmaps (h)
Source: NSF SEEK (Deana Pennington et. al, UNM)Source: NSF SEEK (Deana Pennington et. al, UNM)
![Page 10: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/10.jpg)
10Scientific Workflows, B. Ludaescher & I. Altintas
![Page 11: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/11.jpg)
11Scientific Workflows, B. Ludaescher & I. Altintas
Digression: (Business) Workflows and
Systems
or: what you need to know when someone wants to sell you one ;-)
or: the remote relatives (2nd-3rd cousins?) of scientific workflows
![Page 12: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/12.jpg)
12Scientific Workflows, B. Ludaescher & I. Altintas
What is a (Business) Workflow?• Workflow management (also called Business Process
Management) is the coordination of work processes through software.
• A workflow management system routes pending activities to process participants according to a model of the process.
• WF management systems have been around since the late 1970s (e.g. Officetalk, Xerox PARK)– marketing waves: Office Automation (70’s-80’s), Business Process
Reengineering (90’s), Web Services Choreography (00’s)– roots/related: document management apps, email system apps, database
apps (active DBMS’s, federated DBMS’s)
– Meanwhile (69’-71’) elsewhere: Flow-based programming (J. Paul Morrison)– … not quite workflow but rather dataflow … (we’ll come to that…)
Src/cf: http://www.workflow-research.de/index.htm, M.z. Muehlen, 2003
Src/cf: http://www.workflow-research.de/index.htm, M.z. Muehlen, 2003
![Page 13: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/13.jpg)
13Scientific Workflows, B. Ludaescher & I. Altintas
Some History
Commercial Workflow Systems
Source: http://www.workflow-research.de/index.htm, M.z. Muehlen, 2003
Source: http://www.workflow-research.de/index.htm, M.z. Muehlen, 2003
![Page 14: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/14.jpg)
14Scientific Workflows, B. Ludaescher & I. Altintas
Some History
Commercial Workflow Systems
Source: http://www.workflow-research.de/index.htm, M.z. Muehlen, 2003
Source: http://www.workflow-research.de/index.htm, M.z. Muehlen, 2003
![Page 15: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/15.jpg)
15Scientific Workflows, B. Ludaescher & I. Altintas
Play Time @ Petri Nets World
• Petri Nets are the underlying abstract model of many B-WfMS’s (who said I can’t do bad acronyms, too? ;-)
• http://www.daimi.au.dk/PetriNets/
• http://www.daimi.au.dk/PetriNets/introductions/aalst/
• Let’s see the basic ideas first …
![Page 16: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/16.jpg)
16Scientific Workflows, B. Ludaescher & I. Altintas
Formal Basis: Petri Nets• Mathematical model of discrete distributed systems (named
after Carl Adam Petri, 1960’s)• Provides a modeling language w/ rich theory, analysis tools, … • A Petri net consists of places (P), transitions (T) and directed
arcs (PT or TP). Places can hold tokens. • A transition is enabled if each of its input places contains at
least one token.• An enabled transition can fire, removing input tokens and
producing output tokens
P1
P2
P3 P4T1 T2
Enabled not enabled
![Page 17: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/17.jpg)
17Scientific Workflows, B. Ludaescher & I. Altintas
Formal Basis: Petri Nets• Mathematical model of discrete distributed systems (named
after Carl Adam Petri, 1960’s)• Provides a modeling language w/ rich theory, analysis tools, … • A Petri net consists of places (P), transitions (T) and directed
arcs (PT or TP). Places can hold tokens. • A transition is enabled if each of its input places contains at
least one token.• An enabled transition can fire, removing input tokens and
producing output tokens
P1
P2
P3 P4T1 T2
Enablednot enabled
![Page 18: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/18.jpg)
18Scientific Workflows, B. Ludaescher & I. Altintas
Why Petri Nets • Modeling and designing concurrent systems w/
competing resources (dining philosophers), …
• Lots of analysis techniques, tools, theory– boundedness (state space), – liveness (good things do happen), – safety (bad things do not happen), – reversibility, – deadlock(-freeness), – reachability (of certain states),– …
![Page 19: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/19.jpg)
19Scientific Workflows, B. Ludaescher & I. Altintas
In a Flux: WS-XX-“Standards”
Source: W.M.P. van der Aalst et al. http://tmitwww.tm.tue.nl/research/patterns/http://tmitwww.tm.tue.nl/staff/wvdaalst/Publications/publications.htmlSource: W.M.P. van der Aalst et al. http://tmitwww.tm.tue.nl/research/patterns/http://tmitwww.tm.tue.nl/staff/wvdaalst/Publications/publications.html
![Page 20: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/20.jpg)
20Scientific Workflows, B. Ludaescher & I. Altintas
Everything Flows? But what exactly?
• Dataflow – Data flows through operations (zoom into your
CPU…)– Activity diagrams: data flows through actions– Process networks: data flows between processes
• Control-flow– Nodes are control-flow operations that start other
operations on a state
• Mixed approaches– Statecharts: events trigger state transitions– Petri nets: tokens mark control and dataflow– Workflow languages: mix control and dataflow– … many others …
![Page 21: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/21.jpg)
21Scientific Workflows, B. Ludaescher & I. Altintas
Scientific “Workflows” vs Business Workflows
• Business Workflows (BPEL4WS* …)– Task-orientation: travel reservations; credit approval; BPM; …– Tasks, documents, etc. undergo modifications (e.g., flight reservation
from reserved to ticketed), but modified WF objects still identifiable throughout
– Complex control flow, complex process composition (danger of control flow/dataflow “spaghetti”)
Dataflow and control-flow are often divorced!
• Scientific “Workflows”– Dataflow and data transformations– Data problems: volume, complexity, heterogeneity – Grid-aspects
• Distributed computation • Distributed data
– User-interactions/WF steering– Data, tool, and analysis integration Dataflow and control-flow are often married! (can be a happy marriage…
at times…)*Business Process Execution Language for Web Services (in case you wondered)
![Page 22: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/22.jpg)
22Scientific Workflows, B. Ludaescher & I. Altintas
Scientific “Workflows”: Some Findings
• More dataflow than (business control-/) workflow– DiscoveryNet, Kepler, SCIRun, Scitegic, Triana, Taverna, …,
• Need for “programming extensions” – Iterations over lists (foreach); filtering; functional composition;
generic & higher-order operations (zip, map(f), …)
• Need for abstraction and nested workflows• Need for data transformations (WS1DTWS2)• Need for rich user interaction & workflow steering:
– pause / revise / resume– select & branch; e.g., web browser capability at specific steps
as part of a coordinated SWF
• Need for high-throughput data transfers and CPU cyles: “(Data-)Grid-enabling”, “streaming”
• Need for persistence of intermediate products and provenance
![Page 23: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/23.jpg)
23Scientific Workflows, B. Ludaescher & I. Altintas
Perspectives on Systems
Source: Workflow-based Process Controlling, Michael zur Muehlen, 2003
Source: Workflow-based Process Controlling, Michael zur Muehlen, 2003
/ Dataflow View
![Page 24: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/24.jpg)
24Scientific Workflows, B. Ludaescher & I. Altintas
A Dataflow Component (“Actor”)
“actor” /component
inputchannels
outputchannels
ports
parameters$1, $2, …
![Page 25: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/25.jpg)
25Scientific Workflows, B. Ludaescher & I. Altintas
Actor-Oriented Design
• Object orientation:class name
data
methods
call return
What flows through an object is
sequential control (cf. CCA, MPI)
• Actor/Dataflow orientation:actor name
data (state)
portsInput data
parameters Output data
What flows through an object is a
stream of data tokens
(in SWFs/KEPLER also references!!)
Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/
![Page 26: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/26.jpg)
26Scientific Workflows, B. Ludaescher & I. Altintas
Object-Oriented vs.Actor-Oriented Interfaces
Actor/DataflowOriented
AO interface definition says “Give me text and I’ll give you speech”
OO interface gives procedures that have to be invoked in an order not specified as part of the interface definition.
TextToSpeech
initialize(): voidnotify(): voidisReady(): booleangetSpeech(): double[]
Object Oriented
Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/
![Page 27: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/27.jpg)
27Scientific Workflows, B. Ludaescher & I. Altintas
Ptolemy II
see!see!see!see!
try!try!try!try!
read!read!read!read!
Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/
![Page 28: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/28.jpg)
28Scientific Workflows, B. Ludaescher & I. Altintas
History• Gabriel (1986-1991)
– Written in Lisp– Aimed at signal processing– Synchronous dataflow (SDF) block diagrams – Parallel schedulers– Code generators for DSPs– Hardware/software co-simulators
• Ptolemy Classic (1990-1997)– Written in C++– Multiple models of computation– Hierarchical heterogeneity– Dataflow variants: BDF, DDF, PN– C/VHDL/DSP code generators– Optimizing SDF schedulers– Higher-order components
• Ptolemy II (1996-2022)– Written in Java– Domain polymorphism– Multithreaded– Network integrated– Modal models– Sophisticated type system– CT, HDF, CI, GR, etc.
• PtPlot (1997-??)– Java plotting package
• Tycho (1996-1998)– Itcl/Tk GUI framework
• Diva (1998-2000)– Java GUI framework
• Copernicus (code generator)
• KEPLER (2003-2028)– scientific workflow extensions
Source (Ptolemy): Edward Lee et al. http://ptolemy.eecs.berkeley.edu/Source (Ptolemy): Edward Lee et al. http://ptolemy.eecs.berkeley.edu/
Ptolemy II: A laboratory for investigating designKEPLER: A problem-solving environment for Scientific Workflows
KEPLER = “Ptolemy II + X” for Scientific Workflows
![Page 29: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/29.jpg)
29Scientific Workflows, B. Ludaescher & I. Altintas
An “early” example: An “early” example: Promoter Identification Promoter Identification
SSDBM, AD 2003SSDBM, AD 2003
• Scientist models application as a “workflow” of connected components (“actors”)
• If all components exist, the workflow can be automated/ executed
• Different directors can be used to pick appropriate execution model (often “pipelined” execution: PN director)
![Page 30: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/30.jpg)
30Scientific Workflows, B. Ludaescher & I. Altintas
Why Ptolemy II (and thus KEPLER)?• Ptolemy II Objective:
– “The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation.”
• Dataflow Process Networks w/ natural support for abstraction, pipelining (streaming) actor-orientation, actor reuse
• User-Orientation– Workflow design & exec console (Vergil GUI)– “Application/Glue-Ware”
• excellent modeling and design support• run-time support, monitoring, …• not a middle-/underware (we use someone else’s, e.g. Globus, SRB, …)• but middle-/underware is conveniently accessible through actors!
• PRAGMATICS– Ptolemy II is mature, continuously extended & improved, well-documented
(500+pp) – open source system– Ptolemy II folks actively participate in KEPLER
![Page 31: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/31.jpg)
31Scientific Workflows, B. Ludaescher & I. Altintas
The KEPLER/Ptolemy II GUI (Vergil)
“Directors” define the component interaction & execution semantics
Large, polymorphic component (“Actors”) and Directors libraries (drag & drop)
![Page 32: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/32.jpg)
32Scientific Workflows, B. Ludaescher & I. Altintas
Ptolemy II: Actor-Oriented Modeling
• Component (“actor”) interaction semantics not hard-wired inside components, but “factored out” in a “director”
• Different directors for different modeling and execution needs (… can even be combined!)
Better abstraction, modeling, component reuse, …
![Page 33: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/33.jpg)
33Scientific Workflows, B. Ludaescher & I. Altintas
Behavioral Polymorphism in Ptolemy
«Interface»Receiver
+get() : Token+getContainer() : IOPort+hasRoom() : boolean+hasToken() : boolean+put(t : Token)+setContainer(port : IOPort)
These polymorphic methods implement the communication semantics of a domain in Ptolemy II. The receiver instance used in communication is supplied by the director, not by the component.(cf. CCA, WS-??, [G]BPL4??, … !)
produceractor
consumeractor
IOPort
Receiver
Director
Behavioral polymorphism is the idea that components can be defined to operate with multiple models of computation and multiple middleware frameworks. Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/
![Page 34: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/34.jpg)
34Scientific Workflows, B. Ludaescher & I. Altintas
Domains and Directors: Semantics for
Component Interaction• CI – Push/pull component interaction• CSP – concurrent threads with
rendezvous• CT – continuous-time modeling• DE – discrete-event systems• DDE – distributed discrete events• FSM – finite state machines• DT – discrete time (cycle driven) • Giotto – synchronous periodic• GR – 2-D and 3-D graphics• PN – process networks• SDF – synchronous dataflow• SR – synchronous/reactive• TM – timed multitasking
Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/
For (coarse grained) Scientific Workflows!
For (finer-grained) concurrent jobs!?
![Page 35: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/35.jpg)
35Scientific Workflows, B. Ludaescher & I. Altintas
Polymorphic Actor Components Working Across Data Types and
Domains• Actor Data Polymorphism:
– Add numbers (int, float, double, Complex)– Add strings (concatenation)– Add complex types (arrays, records, matrices)– Add user-defined types
• Actor Behavioral Polymorphism:– In dataflow, add when all connected inputs have data– In a time-triggered model, add when the clock ticks– In discrete-event, add when any connected input has
data, and add in zero time– In process networks, execute an infinite loop in a
thread that blocks when reading empty inputs– In CSP, execute an infinite loop that performs
rendezvous on input or output– In push/pull, ports are push or pull (declared or
inferred) and behave accordingly– In real-time CORBA, priorities are associated with
ports and a dispatcher determines when to add
By not choosing among these when defining the component, we get a huge increment in component re-usability. But how do we ensure that the component will work in all these circumstances?
Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/
![Page 36: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/36.jpg)
36Scientific Workflows, B. Ludaescher & I. Altintas
Directors and Combining Different Component Interaction
Semantics
Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/
Possible app. in SWF:• time-series aware …• parameter-sweep aware … • MPI aware• XYZ aware … … execution models
![Page 37: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/37.jpg)
37Scientific Workflows, B. Ludaescher & I. Altintas
Component Composition & Interaction
• Components linked via ports• Dataflow (and msg/ctl-flow)• Where is the component
interaction semantics defined?? – each component is its own director!
• But still useful for special applications, e.g. parallel programs (MPI, …)
Source: GRIST/SC4DEVO workshop, July 2004, CaltechSource: GRIST/SC4DEVO workshop, July 2004, Caltech
DIR1DIR2
DIR3
DIR4
???
![Page 38: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/38.jpg)
38Scientific Workflows, B. Ludaescher & I. Altintas
CCA via special (“look the other way”) Director(s)?
CCA!?
• Dataflow in CCA• a CCA “convention” can be used to accommodate actor-
oriented/dataflow modeling• CCA/Message Passing in KEPLER
• Kepler/Ptolemy can be extended to accommodate message passing semantics (CSP is already in Ptolemy II)
![Page 39: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/39.jpg)
39Scientific Workflows, B. Ludaescher & I. Altintas
Data/Control-Flow Spectrum
• Data (tokens) flow– (almost) no other side effects– WYSIWYG (usually)
• References flow – token reference type may be “http-get”, “ftp-get”, “hsi put”… – generic handling still possible
• Application specific tokens flow– e.g. current Nimrod job management in Resurgence– “invisible contract” between components– Director is unaware of what’s going on … (sounds familiar? ;-)
• Specific messages passing protocols (e.g., CSP, MPI) – for systems of tightly coupled components
“clean” data(=ctl)-flow special tokens flow message passing, control flow
“actor”
![Page 40: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/40.jpg)
40Scientific Workflows, B. Ludaescher & I. Altintas
KEPLER/CSP: Contributors, Sponsors, Projects
(or loosely coupled Communicating Sequential Persons ;-)Ilkay Altintas SDM, Resurgence
Kim Baldridge Resurgence, NMI Chad Berkley SEEK Shawn Bowers SEEKTerence Critchlow SDM Tobin Fricke ROADNetJeffrey Grethe BIRNChristopher H. Brooks Ptolemy II Zhengang Cheng SDM Dan Higgins SEEKEfrat Jaeger GEON Matt Jones SEEK Werner Krebs, EOLEdward A. Lee Ptolemy II Kai Lin GEONBertram Ludaescher SEEK, GEON, SDM, BIRN, ROADNetMark Miller EOLSteve Mock NMISteve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Xiaowen Xin SDM Yang Zhao Ptolemy IIBing Zhu SEEK •••
Ptolemy IIPtolemy II
![Page 41: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/41.jpg)
41Scientific Workflows, B. Ludaescher & I. Altintas
KEPLER: An Open Collaboration• Initiated by members from NSF SEEK and DOE SDM/SPA; now
several other projects• Open Source (BSD-style license)• Intensive Communications:
– Web-archived mailing lists– IRC (!)
• Co-development: – via shared CVS repository– joining as a new co-developer (currently):
• get a CVS account (read-only)• local development + contribution via existing KEPLER member• be voted “in” as a member/co-developer
• Software & social engineering– How to better accommodate new groups/communities?– How to better accommodate different usage/contribution models (core
dev … special purpose extender … user)?
![Page 42: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/42.jpg)
42Scientific Workflows, B. Ludaescher & I. Altintas
GEON Dataset Generation & Registration
(a co-development in KEPLER)
Xiaowen (SDM)
Edward et al.(Ptolemy)
Yang (Ptolemy)
Efrat(GEON)
Ilkay(SDM)
SQL database access (JDBC)Matt,Chad,
Dan et al. (SEEK)
% Makefile$> ant run
% Makefile$> ant run
![Page 43: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/43.jpg)
43Scientific Workflows, B. Ludaescher & I. Altintas
KEPLER then …
![Page 44: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/44.jpg)
44Scientific Workflows, B. Ludaescher & I. Altintas
… and KEPLER today…
Whatis
HPC?
… so,you see,scientific workflows
need domain and data-polymorphic
actors & must scale to HPC!
What’s a scientific workflow?
What’sa poly-
morphic actor?
BTW: Kepler is NOT a GUI (Vergil
is)
BTW: Kepler is NOT a GUI (Vergil
is)
![Page 45: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/45.jpg)
45Scientific Workflows, B. Ludaescher & I. Altintas
KEPLER Pedigree (to be determined…)
Ptolemy KEPLERPtolemy IIGabriel
SCIRunKhoros
AVS
• Graphical dataflow environments• Problem solving environments• Grid workflows
DiscoveryNet
Taverna
Triana
Pegasus
Matrix
openDX
![Page 46: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/46.jpg)
46Scientific Workflows, B. Ludaescher & I. Altintas
A Few Specific Kepler Features
![Page 47: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/47.jpg)
47Scientific Workflows, B. Ludaescher & I. Altintas
Web Services Actors (WS Harvester)
12
3
4
“Minute-made” (MM) WS-based application integration• Similarly: MM workflow design & sharing w/o implemented
components
![Page 48: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/48.jpg)
48Scientific Workflows, B. Ludaescher & I. Altintas
Recent Actor Additions
![Page 49: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/49.jpg)
49Scientific Workflows, B. Ludaescher & I. Altintas
Digression: Who are the clients?
• Domain scientists1. C/Perl/Python/Java/WS/DB-enabled ones2. others (e.g. visually-inclined rest of us?)
• Goal: make the life better for both!– Workflow automation– Plumbing support– Execution monitoring, steering, runtime
revision (pause-inspect-modify-resume cycle)
![Page 50: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/50.jpg)
50Scientific Workflows, B. Ludaescher & I. Altintas
For the Geoscientist: GEON Mineral Classification
Workflow
![Page 51: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/51.jpg)
51Scientific Workflows, B. Ludaescher & I. Altintas
… inside the Classifier
BrowserUI actor w/ SVG client display
![Page 52: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/52.jpg)
52Scientific Workflows, B. Ludaescher & I. Altintas
in KEPLER (interactive session)
Source: Dan Higgins, Kepler/SEEKSource: Dan Higgins, Kepler/SEEK
![Page 53: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/53.jpg)
53Scientific Workflows, B. Ludaescher & I. Altintas
in KEPLER (w/ editable script)
Source: Dan Higgins, Kepler/SEEKSource: Dan Higgins, Kepler/SEEK
![Page 54: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/54.jpg)
54Scientific Workflows, B. Ludaescher & I. Altintas
A Closer Look at Dataflow … (or: Do you know what’s going on under your carpet? )
control tokens flow, e.g., from “$”-actor to FileReader and ImageReader
actors
actual dataflow is “under the carpet” and through handles
(file system, GridFTP, scp, SRB, …)
• Dataflow: what you see is what you get (almost…)
• Need for a general way to handle references!
![Page 55: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/55.jpg)
55Scientific Workflows, B. Ludaescher & I. Altintas
GEON Data Registration UI
![Page 56: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/56.jpg)
56Scientific Workflows, B. Ludaescher & I. Altintas
GEON Data Registration in KEPLER
![Page 57: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/57.jpg)
57Scientific Workflows, B. Ludaescher & I. Altintas
Registered Resources show up in Vergil (joint SEEK, SPA, GEON, …
Registry!?)
![Page 58: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/58.jpg)
58Scientific Workflows, B. Ludaescher & I. Altintas
Data Analysis: Biodiversity Indices
![Page 59: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/59.jpg)
59Scientific Workflows, B. Ludaescher & I. Altintas
Traffic info for a list of highways: Uses iterate (higher-order “map”) actor to access highway info web service repeatedly, sending out one email per highway.
![Page 60: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/60.jpg)
60Scientific Workflows, B. Ludaescher & I. Altintas
Traffic info for a list of highways: Uses iterate (higher-order “map”) actor to access highway info web service repeatedly, sending out one email per highway.
![Page 61: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/61.jpg)
61Scientific Workflows, B. Ludaescher & I. Altintas
Traffic info for a list of highways: Uses iterate (higher-order “map”) actor to access highway info web service repeatedly, sending out one email per highway.
![Page 62: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/62.jpg)
62Scientific Workflows, B. Ludaescher & I. Altintas
Re-engineered PIW w/ Iteration Constructs AD 2004
map(GenbankWS) Input: {“NM_001924”, “NM020375”} Output: {“CAGT…AATATGAC",“GGGGA…CAAAGA“}
![Page 63: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/63.jpg)
63Scientific Workflows, B. Ludaescher & I. Altintas
Streaming Real-time Data
Laser Strainmeter Channels in; Scientific Workflow;
Earth-tide signal out
Straightforward Example:
Seismic Waveforms
![Page 64: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/64.jpg)
64Scientific Workflows, B. Ludaescher & I. Altintas
ORB
![Page 65: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/65.jpg)
65Scientific Workflows, B. Ludaescher & I. Altintas
Job Management (here: NIMROD)
• Job management infrastructure in place• Results database: under development• Goal: 1000’s of GAMESS jobs (quantum mechanics) – Fall/Winter’04
![Page 66: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/66.jpg)
66Scientific Workflows, B. Ludaescher & I. Altintas
KEPLER Today
• Support for SWF life cycle– Design, share, prototype, run, monitor, deploy, …
• Coarse-grained scientific workflows, e.g.,– web service actors, grid actors, command-line actors, …
• Fine grained workflows and simulations, e.g.,– Database access, XSLT transformations, …
• Kepler Extensions– SDM Center/SPA: support for data- and compute-intensive
workflows!– real-time data streaming (ROADNet)– other special and generic extensions (e.g. GEON, SEEK)
• Status– first release (alpha) was in May 2004– nightly builds w/ version tests– “Link-Up Sister Project” w/ other SWF systems (UK Taverna, Triana,
…)– Participation in various workshops and conferences (GGF10,
SSDBMs, eScience WF workshop, …)
![Page 67: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/67.jpg)
67Scientific Workflows, B. Ludaescher & I. Altintas
KEPLER Tomorrow• Application-driven extensions:
– access to/integration with other IDMAF components • SciRUN?, PnetCDF?, PVFS(2)?, MPI-IO?, parallel-R?, ASPECT?, FastBit, …
– support for execution of new SWF domains• Astrophysics: TSI/Blondin (SPA/NCSU)• Nuclear Physics: Swesty (SPA/LLNL)• …
• Generic extensions:– addtl. support for data-intensive and compute-intensive workflows
(all SRB Scommands, CCA support, …) – (C-z; bg; fg)-ing (“detach” and reconnect)– workflow deployment models
• Additional “domain awareness” (e.g. via new directors)– time series, parameter sweeps, job scheduling, … – hybrid type system with semantic types
• Consolidation– More installers, regular releases, improved documentation, …
![Page 68: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/68.jpg)
68Scientific Workflows, B. Ludaescher & I. Altintas
Desiderata for and Features of Scientific Workflow Automation
• SWF design support – step-wise refinement, component/actor-oriented design, flow-oriented
design, sharing (visual) design with others, …– better component reuse through actor-oriented modeling w/ (largely)
independent directors
• Rapid prototyping support– Web service actors and harvester– Shell/command line actor– Data transformations (e.g., via Perl, Python, XSLT, … actors)
• Workflow “plumbing” support– data transformation actors e.g., in Perl, Python, XSLT, …
• Runtime support– Execution monitoring
• animation for SDF, planned “heartbeat” for PN, … • listening to and logging of token flow through ports and control messages of
directors
– Pause-inspect-modify-resume cycle
![Page 69: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/69.jpg)
69Scientific Workflows, B. Ludaescher & I. Altintas
F I N
Additional material ahead
![Page 70: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/70.jpg)
70Scientific Workflows, B. Ludaescher & I. Altintas
Research (and Development) Issues
…some challenges and ideas…
![Page 71: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/71.jpg)
71Scientific Workflows, B. Ludaescher & I. Altintas
“Service Composition, Orchestration” and all that stuff
• Instead of asking which WS-XXX solves this for you, ask: What is my WF composition problem?
• Also: there is a good amount of previous work, most notably from the Ptolemy group itself:– How do you model systems as interacting
components– How do you model component interaction – How can you make components and interaction
patterns as reusable as possible– … Check out actor-oriented modeling and design!
![Page 72: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/72.jpg)
72Scientific Workflows, B. Ludaescher & I. Altintas
“Programming Patterns”(Higher-Order FP Constructs)
![Page 73: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/73.jpg)
73Scientific Workflows, B. Ludaescher & I. Altintas
Traffic info for a list of highways: Uses iterate (higher-order “map”) actor to access highway info web service repeatedly, sending out one email per highway.
![Page 74: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/74.jpg)
74Scientific Workflows, B. Ludaescher & I. Altintas
Traffic info for a list of highways: Uses iterate (higher-order “map”) actor to access highway info web service repeatedly, sending out one email per highway.
![Page 75: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/75.jpg)
75Scientific Workflows, B. Ludaescher & I. Altintas
Traffic info for a list of highways: Uses iterate (higher-order “map”) actor to access highway info web service repeatedly, sending out one email per highway.
![Page 76: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/76.jpg)
76Scientific Workflows, B. Ludaescher & I. Altintas
hand-crafted control solution; also: forces sequential execution!
designed to fit
designed to fit
hand-craftedWeb-service
actor
Complex backward control-flow
No data transformations
available
[Altintas-et-al-PIW-SSDBM’03][Altintas-et-al-PIW-SSDBM’03]
![Page 77: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/77.jpg)
77Scientific Workflows, B. Ludaescher & I. Altintas
A Scientific Workflow Problem: More Solved (Computer Scientist’s view)
• Solution based on declarative, functional dataflow process network(= also a data streaming
model!)
• Higher-order constructs: map(f) no control-flow spaghetti data-intensive apps free concurrent execution free type checking automatic support to go
from piw(GeneId) to PIW :=map(piw) over [GeneId]
map(f)-style
iterators Powerful type
checking Generic,
declarative “programming”
constructs
Generic data transformation
actors
Forward-only, abstractable sub-workflow piw(GeneId)
![Page 78: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/78.jpg)
78Scientific Workflows, B. Ludaescher & I. Altintas
A Scientific Workflow Problem: Even More Solved (domain&CS coming
together!)
map(GenbankWS) Input: {“NM_001924”, “NM020375”} Output: {“CAGT…AATATGAC",“GGGGA…CAAAGA“}
![Page 79: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/79.jpg)
79Scientific Workflows, B. Ludaescher & I. Altintas
A Research Problem: Optimization by Rewriting
• Example: PIW as a declarative, referentially transparent functional process optimization via functional rewriting
possiblee.g. map(f o g) = map(f) o map(g)
• Technical report &PIW specification in Haskell
map(f o g) instead of map(f) o
map(g)
Combination of map and zip
http://kbis.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdfhttp://kbis.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdf
![Page 80: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/80.jpg)
81Scientific Workflows, B. Ludaescher & I. Altintas
KEPLER Today
• Support for SWF life cycle– Design, share, prototype, run, monitor, deploy, …
• Coarse-grained scientific workflows, e.g.,– web service actors, grid actors, command-line actors, …
• Fine grained workflows and simulations, e.g.,– Database access, XSLT transformations, …
• Kepler Extensions– support for data- and compute-intensive workflows!– real-time data streaming (ROADNet)– other special and generic extensions (e.g. GEON, SEEK)
• Status– first release (alpha) was in May 2004– nightly builds w/ version tests– “Link-Up Sister Project” w/ other SWF systems (UK Taverna, Triana,
…)– Participation in various workshops and conferences (GGF10,
SSDBMs, eScience WF workshop, …)
![Page 81: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/81.jpg)
82Scientific Workflows, B. Ludaescher & I. Altintas
KEPLER Tomorrow• Application-driven extensions:
– access to/integration with other IDMAF components • SciRUN?, PnetCDF?, PVFS(2)?, MPI-IO?, parallel-R?, ASPECT?, FastBit, …
– support for execution of new SWF domains• Astrophysics: TSI/Blondin (SPA/NCSU)• Nuclear Physics: Swesty (SPA/LLNL)• …
• Generic extensions:– addtl. support for data-intensive and compute-intensive workflows
(all SRB Scommands, CCA support, …) – (C-z; bg; fg)-ing (“detach” and reconnect)– workflow deployment models
• Additional “domain awareness” (e.g. via new directors)– time series, parameter sweeps, job scheduling, … – hybrid type system with semantic types
• Consolidation– More installers, regular releases, improved documentation, …
![Page 82: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/82.jpg)
83Scientific Workflows, B. Ludaescher & I. Altintas
Towards a more concise Presentation Style …
Due to lack of time, some slides will be “by reference” only ;-)
– …Each speaker was given four minutes to present his paper, as there were so many scheduled -- 198 from 64 different countries. To help expedite the proceedings, all reports had to be distributed and studied beforehand, while the lecturer would speak only in numerals, calling attention in this fashion to the salient paragraphs of his work. ... Stan Hazelton of the U.S. delegation immediately threw the hall into a flurry by emphatically repeating: 4, 6, 11, and therefore 22; 5, 9, hence 22; 3, 7, 2, 11, from which it followed that 22 and only 22!! Someone jumped up, saying yes but 5, and what about 6, 18, or 4 for that matter; Hazelton countered this objection with the crushing retort that, either way, 22. I turned to the number key in his paper and discovered that 22 meant the end of the world… [The Futurological Congress, Stanislaw Lem, translated from the Polish by Michael Kandel, Futura 1977]
![Page 83: Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.](https://reader036.fdocuments.us/reader036/viewer/2022062519/56649ce15503460f949ab293/html5/thumbnails/83.jpg)
84Scientific Workflows, B. Ludaescher & I. Altintas
References• Kepler: http://kepler-project.org • Ptolemy: http://ptolemy.eecs.berkeley.edu/ • Flow-based Programming: http://www.jpaulmorrison.com/fbp/index.shtml• Wiki with links to others: http://www.jpaulmorrison.com/cgi-bin/wiki.pl
– http://c2.com/cgi/wiki?FlowBasedProgramming– http://c2.com/cgi/wiki?DataflowProgramming – http://c2.com/cgi/wiki?ActorsModel