Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems...

24
Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28 Stian Soiland-Reyes myGrid, University of Manchester

Transcript of Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems...

Page 1: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

Wf4Ever:Preserving workflows as digital Research Objects

EGI Community Forum 2012, Workflow Systems workshopLeibniz Supercomputing Centre, Münich, 2012-03-28

Stian Soiland-ReyesmyGrid, University of Manchester

Page 2: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

2

My background

myExperiment - Web 3.0 virtual environment, library and social network for workflows

~5000 registered users~2200 workflows~21 different systems

Taverna - Scientific Workflow Management System

~85000 downloads~EU projects: SCAPE, BioVeL, HELIO, e-Lico, VPH-SHARE, EGI-INSPiRE….

http://www.myexperiment.org/

http://www.taverna.org.uk/

Page 3: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

“A biologist would rather share their toothbrush than their gene name”

Mike Ashburner and othersProfessor in Dept of Genetics,

University of Cambridge, UK

Page 4: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

“Facebook for Scientists”...but different to Facebook!

A repository of research methods

A social network of people and things

A Social Virtual Research Environment

A probe into researcher behaviour

Open source (BSD) Ruby on Rails app

REST and SPARQL, Linked Data

Influenced BioCatalogue, MethodBox and SysMO-SEEK

myExperiment currently has 5378 members, 292 groups, 2273 workflows, 534 files and 217 packs

http://www.myexperiment.org/

Page 6: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

http://www.wf4ever-project.org/

Workflow Preservation

Research Objects

Provenance

Recommendation

Astronomy and Genomics

Page 7: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

7

» Scientific workflows enable automation of scientific methods and encourage best practices to be shared

» Workflows need to be preserved for› Reuse, fundamental for incremental

scientific development› Method reproducibility, key for

credit and publication» Workflow preservation is complex!» Heterogeneous types of information

need to be aggregated, including workflows and related resources forming research objects

» Research objects need to be trusted and understandable n years from now

» Social aspects need to be addressed in order to support reuse in scientific communities

ChallengesWf4Ever

Preservation of scientific workflows in data-intensive science

Page 8: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

Reusable. The key tenet of Research Objects is to support the sharing and reuse of data, methods and processes. Repurposeable. Reuse may also involve the reuse of constituent parts of the Research Object. Repeatable. There should be sufficient information in a Research Object to be able to repeat the study, perhaps years later. Reproducible. A third party can start with the same inputs and methods and see if a prior result can be confirmed.

Replayable. Studies might involve single investigations that happen in milliseconds or protracted processes that take years.Referenceable. If research objects are to augment or replace traditional publication methods, then they must be referenceable or citeable.Revealable. Third parties must be able to audit the steps performed in the research in order to be convinced of the validity of results.Respectful. Explicit representations of the provenance, lineage and flow of intellectual property.

The R.* dimensions

Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.nature.com/eresearch/

Page 9: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

9

Forms of decayWf4Ever

Workflow Decay• Service decay

• Flux/decay/unavailability• Data decay

• Formats/ids/standards• Infrastructure decay

• platform/resources

Experiment Decay• Methodological changes• New technologies• New resources/components• New data

Page 10: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

10

Preservation, Conservation, Recreating

PreservingArchived RecordFixed SnapshotsReviewRerun & Replay

ConservingActive InstrumentLiveRerun & ReuseRepair & Restore

RecreatingArchived RecordActive InstrumentLiveRebuild Recycle Repurpose

Page 11: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

12

Research objects

Page 12: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

1313 13

Research Objects as Social Objects

Page 13: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

14

Research Object model core (simplified)http://purl.org/wf4ever/ro#

ro:Resourcero:ResearchObject

ro:Manifest

ro:AggregatedAnnotation

ore:aggregates

ro:annotatesAggregatedResource

wfdesc:Workflow

ore:isDescribedBy

Note: This figure shows a simplified view of the RO core.

RO specification: http://wf4ever.github.com/ro/

Page 14: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

15

Research Object model corehttp://purl.org/wf4ever/ro#

Page 15: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

16

RO model: Workflow Descriptionhttp://purl.org/wf4ever/wfdesc#

Page 16: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

17

Workflow Provenance (wfprov)http://purl.org/wf4ever/wfprov#

Page 17: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

18

Technical infrastructure

• Models Semantic Web Encoding• Research Object• Annotation• Provenance• Evolution and Versioning

• Services Web APIs, REST services• Foundational, Extension, User• APIs, Architecture

• Principles• Map into standards• Adopt standards• Lightweight components

• Ecosystem• Command line• Portal• Third party systems

Page 18: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

19

Foundation Services

ExtensionServices

UserClients

ServicesThe Wf4Ever Proposal

Page 19: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

20

Lifecycle Services Storage Services

Wf4Ever Reference Implementation

Access & Usage Clients

Data Management & Analysis Services

Stability Evaluation

Completeness Evaluation

Recommender

RO Portal RO Manager Tool

RO Digital Library

ROBox

Dropbox Client

Prototype, Dec 2011

Taverna Workflow Mgmt System

Page 20: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

21

Year 1 (Dec 2010 Dec 2011)Roadmap

» Exploration (2011)Problem specification and requirements identificationBetter understanding of workflow preservation needs

from the domains (what does it mean to preserve a scientific workflow?)

Proofs of conceptsPreliminary models, components, and integrated

reference implementationResult identification

Page 21: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

22

Year 2 (Dec 2011 Dec 2012)Roadmap

Realization/validation (2012)› Validate the models, architectures and software in practice› Distributed components with different access/security

arrangements – forming REST APIs and specifications› RO Content Campaign: Generate 1000s of ROs› First productization phase: Stable releases of models and

reference implementation› Decay monitoring and notification (why my wf is no longer

stable), reacting to decay, attribution and credit support beyond recommendation. Detailed use of provenance

› Execution and interoperability support (SHIWA integration)

Page 22: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

23

Year 3 (Dec 2012 Dec 2013)Roadmap

» Exploitation (2013)› Final productization phase› Deployment in user environments and systems, enhanced with

workflow preservation capabilities› RO-enabled myExperiment› RO-enabled Galaxy› RO-enabled dataVerse› … and more!› Deployment in publishers e.g. Elsevier, Digital Science,

GigaScience

Page 23: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

24

Collaborations and impact» SHIWA – Sharing Interoperable Workflows» Publishers/journals: Elsevier, GigaScience (by BGI)» OpenPHACTS (nanopublications)» SCAPE (dataset preservation)» BioVel (biodiversity - species preservation!)» Dataverse (data repository)» Galaxy (workflow system for genomics)» GenomeSpace (data integration platform)

Page 24: Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28.

25

Thank you!

Any Questions?

http://www.wf4ever-project.org/

This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.