Creating abstractions from scientific workflows: PhD symposium 2015
Transcript of Creating abstractions from scientific workflows: PhD symposium 2015
Date: 04/05/2015
Creation of abstractionsin scientific workflows
Daniel Garijo Verdejo,
Oscar Corcho, Yolanda Gil
Ontology Engineering Group. Laboratorio de Inteligencia Artificial
Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid
2
Overview: In Silico Scientific workflows
Benefits:•Sharing and reusing previous work•Time savings: reexecution of old experiments with different parameters).•Teaching: new students can learn existing methods in the lab•Design for modularity, so others can reuse •Design for standardization, reduction of heterogeneity•Debugging of executions•Paper writing, linking execution pipelines to publications.•Reproducibility.•Etc.
Lab book
Digital Log
Laboratory Protocol (recipe)
Workflow
Experiment
Hypotheses
Scientific workflow repositories can be mined automatically to extract reusable patterns and abstractions that are
useful for workflow developers aiming to reuse existingworkflows.
•H1: It is possible to define common domain independent
patterns based on the functionality of workflow steps.
•H2: It is possible to detect common reusable patterns automatically.
•H3: Common reusable patterns are potentially useful for users
3
Challenges
•Workflow representation•Heterogeneous representations.•Lack of a standard•Lack of methodologies for publishing workflows.
•Workflow abstraction•There are no catalogs of the typical abstractions that can be found in scientific workflows based on their basic step functionality.•Difficulty in relating workflows.
•Workflow reuse•Difficult to determine which parts of a workflow could be reused for /in another workflow
•Workflow annotation and documentation•Manual process
4
Approach
5
Vocabularies and methodologies for representing and publishing workflows
6
Interactive Browsing
(Pubby frontend)
Programatic access(external apps)
Wings workflow generation
OPM/PROVconversion
Publication Share Reuse
Core
Portal
WINGS on local laptop
Workflow
Template
WorkflowInstance
PROV
export
Core
Portal
WINGS on shared host
Workflow
Template
Workflow
Instance
PROV
export
Core
Portal
WINGS on web server
Workflow Template
Workflow
Instance
PROV
export
LinkedData
Publication
Users
Other workflow environments
RDF TripleStore
Workflow Provenance
Workflow PlanMethodology for workflow publishing
Repository of linked workflows:http://www.opmw.org/sparql
http://purl.org/net/p-plan
http://www.opmw.org/ontology/
Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56.
Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston, 2012.
Definition of workflow abstractions
7
Catalog of common independent workflow abstractions (motifs)
Data-oriented motifs: What kind of manipulations does the workflowhave?
Workflow-oriented motifs: How doesthe workflow perform its operations
Analysis from 260 different workflowsfrom 10 domains analyzed belongingto 5 different workflow systems
http://purl.org/net/wf-motifs#
Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, Carole Goble, Common motifs in scientific workflows: An empirical analysis, Future Generation Computer Systems, Volume 36, July 2014, Pages 338-351
Finding and evaluating common abstractions
8
https://github.com/dgarijo/FragFlow
http://purl.org/net/wf-fd
Graph mining techniques
Workflow fragmentrepresentationand linkage
Workflow fragmentFiltering techniques
Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A.Gutman,Ivo D. Dinov, Paul Thompson, and Arthur W. Toga. FragFlow: Automated Fragment Detection in Scientific Workflows. In The 10th IEEE International Conference on e-Science, Guaruja, 2014
Evaluation and results
9
Scientific workflow repositories can be mined automatically to extract reusable patterns and abstractions that are useful for workflow developers aiming to reuse existing
workflows.
•Evaluation 1: Comparison against what users defined in the corpus•Are our patterns similar to what you identified as a useful pattern?
•When playing with the pattern frequency, up to 75% of the detected patterns are the same as the ones defined by users.
•Evaluation 2: User survey•From those patterns we found disjoint with the user defined ones, are they useful?
•66%-100% of the proposed patterns were considered useful
•Survey on three corpora.
Summary
10
•Workflow representation•Models based on standards for representing workflow provenance and workflow templates•Adapted a common used methodology for publishing workflows as web objects.
•Workflow abstraction•Defined a catalog of common domain independent abstractions, based on their functionality.•Provided an ontology for semi-automatic annotation.
•Workflow reuse•Automatic detection and annotation of common useful patterns given a workflow corpora.•Models to relate how patterns link and relate different workflows on a workflow corpus.
11
Collaborators and co-authors
•Daniel Garijo, Oscar CorchoOntology Engineering Group, UPM
•Yolanda GilInformation Sciences Institute, USC
•Boris A. Gutman, Ivo D. Dinov, Paul Thompson Arthur W. Toga,Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad. USC Laboratory of Neuro Imaging
IEEE eScience 2014. Guarujá, Brasil
•Pinar Alper, Khalid Belhajjame, Carole Goble