Creating abstractions from scientific workflows: PhD symposium 2015

11
Date: 04/05/2015 Creation of abstractions in scientific workflows Daniel Garijo Verdejo, Oscar Corcho, Yolanda Gil Ontology Engineering Group. Laboratorio de Inteligencia Artificial Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid

Transcript of Creating abstractions from scientific workflows: PhD symposium 2015

Page 1: Creating abstractions from scientific workflows: PhD symposium 2015

Date: 04/05/2015

Creation of abstractionsin scientific workflows

Daniel Garijo Verdejo,

Oscar Corcho, Yolanda Gil

Ontology Engineering Group. Laboratorio de Inteligencia Artificial

Departamento de Inteligencia Artificial

Facultad de Informática

Universidad Politécnica de Madrid

Page 2: Creating abstractions from scientific workflows: PhD symposium 2015

2

Overview: In Silico Scientific workflows

Benefits:•Sharing and reusing previous work•Time savings: reexecution of old experiments with different parameters).•Teaching: new students can learn existing methods in the lab•Design for modularity, so others can reuse •Design for standardization, reduction of heterogeneity•Debugging of executions•Paper writing, linking execution pipelines to publications.•Reproducibility.•Etc.

Lab book

Digital Log

Laboratory Protocol (recipe)

Workflow

Experiment

Page 3: Creating abstractions from scientific workflows: PhD symposium 2015

Hypotheses

Scientific workflow repositories can be mined automatically to extract reusable patterns and abstractions that are

useful for workflow developers aiming to reuse existingworkflows.

•H1: It is possible to define common domain independent

patterns based on the functionality of workflow steps.

•H2: It is possible to detect common reusable patterns automatically.

•H3: Common reusable patterns are potentially useful for users

3

Page 4: Creating abstractions from scientific workflows: PhD symposium 2015

Challenges

•Workflow representation•Heterogeneous representations.•Lack of a standard•Lack of methodologies for publishing workflows.

•Workflow abstraction•There are no catalogs of the typical abstractions that can be found in scientific workflows based on their basic step functionality.•Difficulty in relating workflows.

•Workflow reuse•Difficult to determine which parts of a workflow could be reused for /in another workflow

•Workflow annotation and documentation•Manual process

4

Page 5: Creating abstractions from scientific workflows: PhD symposium 2015

Approach

5

Page 6: Creating abstractions from scientific workflows: PhD symposium 2015

Vocabularies and methodologies for representing and publishing workflows

6

Interactive Browsing

(Pubby frontend)

Programatic access(external apps)

Wings workflow generation

OPM/PROVconversion

Publication Share Reuse

Core

Portal

WINGS on local laptop

Workflow

Template

WorkflowInstance

PROV

export

Core

Portal

WINGS on shared host

Workflow

Template

Workflow

Instance

PROV

export

Core

Portal

WINGS on web server

Workflow Template

Workflow

Instance

PROV

export

LinkedData

Publication

Users

Other workflow environments

RDF TripleStore

Workflow Provenance

Workflow PlanMethodology for workflow publishing

Repository of linked workflows:http://www.opmw.org/sparql

http://purl.org/net/p-plan

http://www.opmw.org/ontology/

Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56.

Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston, 2012.

Page 7: Creating abstractions from scientific workflows: PhD symposium 2015

Definition of workflow abstractions

7

Catalog of common independent workflow abstractions (motifs)

Data-oriented motifs: What kind of manipulations does the workflowhave?

Workflow-oriented motifs: How doesthe workflow perform its operations

Analysis from 260 different workflowsfrom 10 domains analyzed belongingto 5 different workflow systems

http://purl.org/net/wf-motifs#

Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, Carole Goble, Common motifs in scientific workflows: An empirical analysis, Future Generation Computer Systems, Volume 36, July 2014, Pages 338-351

Page 8: Creating abstractions from scientific workflows: PhD symposium 2015

Finding and evaluating common abstractions

8

https://github.com/dgarijo/FragFlow

http://purl.org/net/wf-fd

Graph mining techniques

Workflow fragmentrepresentationand linkage

Workflow fragmentFiltering techniques

Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A.Gutman,Ivo D. Dinov, Paul Thompson, and Arthur W. Toga. FragFlow: Automated Fragment Detection in Scientific Workflows. In The 10th IEEE International Conference on e-Science, Guaruja, 2014

Page 9: Creating abstractions from scientific workflows: PhD symposium 2015

Evaluation and results

9

Scientific workflow repositories can be mined automatically to extract reusable patterns and abstractions that are useful for workflow developers aiming to reuse existing

workflows.

•Evaluation 1: Comparison against what users defined in the corpus•Are our patterns similar to what you identified as a useful pattern?

•When playing with the pattern frequency, up to 75% of the detected patterns are the same as the ones defined by users.

•Evaluation 2: User survey•From those patterns we found disjoint with the user defined ones, are they useful?

•66%-100% of the proposed patterns were considered useful

•Survey on three corpora.

Page 10: Creating abstractions from scientific workflows: PhD symposium 2015

Summary

10

•Workflow representation•Models based on standards for representing workflow provenance and workflow templates•Adapted a common used methodology for publishing workflows as web objects.

•Workflow abstraction•Defined a catalog of common domain independent abstractions, based on their functionality.•Provided an ontology for semi-automatic annotation.

•Workflow reuse•Automatic detection and annotation of common useful patterns given a workflow corpora.•Models to relate how patterns link and relate different workflows on a workflow corpus.

Page 11: Creating abstractions from scientific workflows: PhD symposium 2015

11

Collaborators and co-authors

•Daniel Garijo, Oscar CorchoOntology Engineering Group, UPM

•Yolanda GilInformation Sciences Institute, USC

•Boris A. Gutman, Ivo D. Dinov, Paul Thompson Arthur W. Toga,Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad. USC Laboratory of Neuro Imaging

IEEE eScience 2014. Guarujá, Brasil

•Pinar Alper, Khalid Belhajjame, Carole Goble