Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
-
Upload
carole-goble -
Category
Education
-
view
1.016 -
download
0
description
Transcript of Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
![Page 1: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/1.jpg)
Workflows, Provenance & Reporting A Lifecycle Perspective
Professor Carole Goble FREng FBCS
The University of Manchester, UK
3rd – 6th September 2013, Rome, Italy
![Page 2: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/2.jpg)
The Scientific and Technical Ecosystem
Mobilising Big and Broad Data• Streaming• Sweeps through models• Integrative analysis• Results synthesis• Heavy compute
Interoperability, plugging together• Multi step chains, Multi software / data• Mixed resources / platforms• Incompatibility smoothing• Trans-disciplinary, Alien processes
[DataONE]
![Page 3: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/3.jpg)
BioSTIF
inputs: data, parameters, configurations
outputs
Workflow nutshell• A series of automated /
interactive data analysis steps
• Process data at scale• Import data / codes
from one’s own research and/or from existing libraries
• Pipelines & analytic and synthesis procedures
• Chains of components• Bridges between
resources• Shield from change
and operational complexity
• Releasing capacity
Services
Resources
![Page 4: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/4.jpg)
ProvisioningWorkflows
ApplnService
ApplnService
Users
Workflows
CompositionIncorporation
Invocation
Applications• Applications components
of workflows• Compose applications
into workflows• Incorporate workflows
into applications
Infrastructure• Provision physical
resources to support application workflows
• Coordinate resources through workflows
• Optimise and adapt to change
[Foster 2005]
Workflows
Wfms
![Page 5: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/5.jpg)
Assembly of Components
InteroperabilityCovering up incompatibility
![Page 6: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/6.jpg)
Flexible variation
StabilisingOptimising
![Page 7: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/7.jpg)
Workflows: maturing approachUnderpin integrative platforms.
Established in many disciplines, notably chemistry and biology, esp. ‘omics: assembly, synthesis, annotation, analytics.
Overlaps with metagenomics, phylogenetics and genetic ecology
Powering service based science and science as a service http://www.globus.org/genomics/solution
Sandve, Nekrutenko, Taylor, Hovig Ten simple rules for reproducible in silico research, PLoS Comp Bio submitted
![Page 8: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/8.jpg)
Ecological Niche modelling, population modelling, Metagenomics and Phylogenetics ‘omics pipelines and analytic workflows http://www.biovel.eu
Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis http://camera.calit2.net/index.shtm
Combine species occurrence data with global climate, terrain and land cover information, to identify environmental correlates of species ranges. http://www.lifemapper.org/species
BioDiversity
![Page 9: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/9.jpg)
Taxonomic Data Refinement
www.biovel.eu
• Synonym expansion • Taxonomic name resolution• Occurrence retrieval• Spell checking• Geographic and taxonomic cleaning• Temporal refinement• Data processing log
[Matthias Obst, INTECOL 2013]
![Page 10: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/10.jpg)
Data Operations in Workflows in the Wild
Analysis of 260 publicly available workflows in Taverna, WINGS, Galaxy and Vistrails
Garijo et al Common Motifs in Scientific Workflows: An Empirical Analysis, in press, FGCS
![Page 11: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/11.jpg)
Large ScaleEcological Niche Modeling Workflow
.
Step 1: Explorative modeling-Use unfiltered data -Use fixed parameters: Mahalonobis distance (Farber and Kadmon 2003)-Native projections-Test the model, distribution of points, number of points
Step 2: Deep modeling-Filtering environmentally unique points with BioClim algorithm (Nix 1986)-ENM with Support Vector Machine (Cristianini & Shawe-Taylor 2000) and Maximum Entropy (Phillips 2004)-Parameter optimization (if necessary) on the model test results-2 masks (model generate, model project)
Data discoveryData discovery
Data assembly, cleaning, and refinement
Data assembly, cleaning, and refinement
Ecological Niche Modeling
Ecological Niche Modeling
Statistical analysisStatistical analysis
Analytical cycle
Pilumnus hirtellus
Enclosed sea problem (Ready et al., 2010)
[Matthias Obst, INTECOL 2013]
![Page 12: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/12.jpg)
Workflow-enabled science• Common Templates• Prepared components• Systematic assembly• (Steered) automation
• Hybrid combinations• Variations• Extensibility• Customisation• Parameterisation
• Repeats• Cross-run synthesis• Routine, pooled methods• Tracking
![Page 13: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/13.jpg)
Repeated model sweepsTen insect species were modelled: European spruce bark beetle – Ips typographus L. Bordered white moth (syn. pine looper) - Bupalus piniarius L., (syn. B. piniaria L.) Pine-tree lappet - Dendrolimus pini L. Mottled umber - Erannis defoliaria Clerck Nun moth - Lymantria monacha L. Winter moth - Operopthera brumata L. Pine beauty moth - Panolis flammea Den. & Schiff Green oak tortrix - Tortrix viridana L. European pine sawfly – Neodiprion sertifer Geoffr. Common pine sawfly – Diprion pini L. Tortrix viridana Image by Kimmo & Seppo Silvonen Lymantria monacha
data
configurationparameters
steps Päivi Lyytikäinen-Saarenmaa presentation, INTECOL 2013
![Page 14: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/14.jpg)
http://www.jisc.ac.uk/whatwedo/campaigns/res3/jischelp.aspx
Workflows
workflowsresults
provenanceprocess (log)results (origin)
ReportingRecord of scienceReproducibility Transparent process
Integrate with reporting systems
Know howTraining
See Penevpresentation
![Page 15: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/15.jpg)
Provenance the link between computation and results
W3C PROV model standardrecord for reportingcompare diffs/discrepanciesprovenance analyticstrack changes, adapt partial repeat/reproducecarry attributionscompute creditscompute data quality/trustselect data to keep/releaseoptimisation and debugging
PDIFF: comparing provenance traces to diagnose divergence across experimental results [Woodman et al, 2011]
![Page 16: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/16.jpg)
[Freire]http://www.aosabook.org/en/vistrails.html
Collecting -> Using ProvenanceInstrumenting, cross-tool interoperabilityReporting at different scales
![Page 17: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/17.jpg)
b
Publishing with Provenance
![Page 18: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/18.jpg)
Summary: Infrastructure Productivity
CustomiseCustomise
ProcessProcess
CustomiseCustomise
ProcessProcess
CustomiseCustomise
EnvironmentEnvironmentLegacy, others and your own software, datasets, services, codes, and platforms. optimise and manage use of computing infrastructure, HPC, clouds and platforms
WFMSmiddleware
WFMSmiddleware
Support the design, config. and execution of workflows. manage utility actions for data, logging, security, compute, errors…shield incompatibilities / complexity / change
Parameterised, integrative, multi-step (data) pipelines, analytics, computational protocols. That can be repetitively reused. dependency-rich interoperability.
WorkflowWorkflow
AppsAppsDomain/task specific apps that incorporate (an ecosystem of) workflowsIntegrate
![Page 19: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/19.jpg)
Summary: User Productivity: Capability Raising
AccessAccessFramework to access and leverage heterogeneous legacy applications, services, datasets and codes. Shielding from complexity.
CustomiseCustomiseRapid development: Flexibility, Extensibility, Adaptability, Reuse. Reusable Workflow Components
ProcessProcessAutomated plumbing + InteractionSystematic, repetitive and unbiased analysis and processing and error handlingEnsembles, comparisons, “what ifs”
CustomiseCustomiseRapid development: Flexibility, Extensibility, Adaptability, Reuse. Reusable Workflow Components
ProcessProcessAutomated plumbing + InteractionSystematic, repetitive and unbiased analysis and processing and error handlingEnsembles, comparisons, “what ifs”
CustomiseCustomiseRapid development: Flexibility, Extensibility, Adaptability, Reuse. Reusable Workflow Components
AccessAccessFramework to access and leverage heterogeneous legacy applications, services, datasets and codes and combine with yours.Shielding from complexity.
ProcessProcessIntegration, Reusable workflows/componentsAutomated plumbing + InteractionSystematic, repetitive and unbiased analysis Ensembles, comparisons, “what ifs”
Process reporting. Citation tracking. Reproducibility, Provenance, Audit. Quality Control. Standard Operating Procedures.RecordRecord
CustomiseCustomiseRapid development: Flexibility, Extensibility, Adaptability, Reuse. Reusable Workflow Components
![Page 20: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/20.jpg)
Workflow Commoditiesbuilding cohorts, capturing traits,
explicit reporting, clear instructions
• Workflow templates• Workflow sets• Libraries of sub workflow parts• Design practices for mix, match
and reuse • Future proofed design predicting
need to adapt• Discovery and exchange• Workflow engineers• Workflow custodians
![Page 21: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/21.jpg)
Seeding a workflow library
![Page 22: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/22.jpg)
Workflow Commodities exchanging, curating, preserving, packaging, life cycle management
http://www.researchobject.orghttp://www.dcc.ac.uk
![Page 23: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/23.jpg)
Katy’s student’s 200 hoursTracking where data went
Workflow Commoditiesgetting credit, capability, engineers and custodians
![Page 24: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/24.jpg)
Application Buildinguser variety, outcome focused
• Right apps, right users.• Commodity apps:
– Web. Spreadsheets. R.
• Customisation• Mixed workflow / scripting• Deployment / Portability
– Web based / desktop– Virtualised deployments– Cloud hosted service– A cloud-enabled local host
• Local ownership• Capability building
Workflow
Visibility
BioDiversity
Low
Concept K
nowledge
High
Technology/Infrastructure
Dom
ain Scientist
Technical specialists
Com
putational Scientist
CustomSpecificApps
GeneralToolkits
Policy
makers
Low
High
Versatility
![Page 25: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/25.jpg)
Who are the users?• Policy makers?• Biodiversity researcher?• Computational scientist?• Tool developer?• Service provider?• Infrastructure provider?• Digital custodian?
![Page 26: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/26.jpg)
Workflow management systems• Integrated into community frameworks,
coupled into tools• Virtualised (Web) Services
• Scaling, Optimisation• Interoperability, Using provenance• No one workflow language/system
• Specialisation & its cost• Plug-ins for common community
platforms and resources• Mitigating and adapting to changes in
infrastructures and resources.• Sustainability and engineering
Generic
Specific
http://www.erflow.eu/
![Page 27: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/27.jpg)
Population dynamicsThe life cycle of infrastructures
• Dynamics: Mitigate, Adapt, Disperse, Die
• Standard and maintained prog. interfaces (APIs)
• Standard formats and ids• Stability, reliability, repair• Interoperability• Semantic descriptions• Sustainability of services
and infrastructure• Instrument resources for
citation & microattribution• Coupled services and
infrastructure.
![Page 28: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/28.jpg)
Impact of dependencies
[Zhao et al. Why workflows break e-Science 2012]
![Page 29: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/29.jpg)
Summary
Scale.Standards data formats, programmatic interfaces. Governance.
Workflow commoditiesDesign practicesCredit
A seamless, pluggable service. Scale. Adaptability. Specific-Generic tension. Putting provenance to use for data credit.
Embedding workflows in common applications Integration into reporting and publishing lifecycles
![Page 30: Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome](https://reader034.fdocuments.us/reader034/viewer/2022052505/554e737db4c9054a698b4c27/html5/thumbnails/30.jpg)
BioDiversity Virtual e-Laboratorywww.biovel.eu
Wf4Everwww.wf4ever-project.org
SysMOwww.sysmo-db.org
SCaleable Preservation Environmentshttp://www.scape-project.eu