10 Best Practices for Workflow Design

Presented at the 2nd BioVeL Workshop on taxonomic and phylogenetic workflows (http://www.biovel.eu/index.php?option=com_content&view=article&id=43:ms6-workshop&catid=22:biovel-meetings&Itemid=122)

  • 1. The 10 Best Practices for Workflow Design BioVeL M6 Workshop Gteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft , Carole Goble (myGrid)Thanks: BioSemantics Group (LUMC), myGrid team (UoM), Yassene Mohamed, Harish Dharuri (LUMC)

2. Our specialty: Knowledge Discovery http://biosemantics.org Disambiguation* Text Mining Substrates forKnowledgeDiscovery Methods forKnowledge Discovery Applications Predict protein-protein, protein-disease associations, gene prioritization Genotype-phenotype studies, e.g. Huntingtons Disease, Metabolic Syndrome Yours?* Global disambiguation initiative: http://snipurl.com/conceptweballiance2 3. Introduction Why build good workflows?Good workflow design = good science! 3 4. IntroductionBest practices for workflow design Best Practices for workflow design=Best Practices experimental science+Best Practices software engineering4 5. 1Make a sketch workflow 5 6. Best practice 1 Sketch an Abstract WorkflowPowerpoint courtersy of Eleni Mina 6 7. 2Use modules7 8. http://www.myexperiment.org/workflows/74.html8 9. 3 Think about the output(and the data in your workflow in general) 9 10. Best practice 3Think about the output?http://... 10 11. 4Provide example inputs and outputs 11 12. Taverna 2.3 RecipeTaverna 2.4Select input/outputRight-click input/outputSelect tab DetailsSelect Annotation Click Annotation Add Example Add Example 12 13. 5Annotate 13 14. Best practice 5 AnnotateEach component in Taverna can beannotated14 15. Best practice 5Annotate and help your users15 16. 6Make workflow executable from outside the local environment 16 17. Best practice 6 Make workflow executable by othersHow to check that others can execute your workflow? Try it! Proof of executability Ask a colleague Use an external t2web runner Tips Use Web Services If you use local command line tools Install tools on a publicly accessible server (e.g. applies to Rserve) Use system that your users can set up (e.g. BioLinux) 17 18. 7Choose services carefully18 19. Best practice 7Choose services carefully 19 20. Best practice 7Choose services carefully 20 21. 8Reuse existing workflows 21 22. Best practice 8 The reuse workflow Not a best practice, but a tip: know-how is Check important for reuseContact authorsworkflows on Neg. RetrymyExperiment Pos. Use scripts from Neg.colleaguesCheck services Search theContact authors on internet Neg. Retry BioCatalogue Pos. Invent a newwheelReuse, AttributeRespect licences22 23. 9Advertise23 24. Advertise Unique reference forin your papers and for others to cite 24 25. 10Maintain 25 26. Best Practice 10 MaintainBest practices to support maintenance Regularly check your workflow Ask colleagues Enable support for maintenance Register your workflow on myExperiment Register Web Services on Enable peers to repair: annotate! Note about versioning No need to register all edits on myExperiment: use subversion Register important updates on myExperiment 26 27. Bonus tipUse common sense as scientist27 28. Workflow ForeverPreservation of good workflows forfuture applications Workflow 74 Protein Discovery 2005Workflow 2876Match gene listsby literature 2012Workflow 2805Get Pathway genes 2012 28 29. Wf4EverOutcomes for BioVeLmyExperiment 2.0BioCatalogueTavernaResearch ObjectsLinked DataMethodsProtocols for Preservation and Conservation29 30. The 10 Best Practices of Workflow DesignThank youThank you for your attentionMore information:http://snipurl.com/workflowbestpractices1.Make a sketch workflow2.Use modules3.Think about the output4.Provide example inputs and outputs5.Annotate6.Make it executable from outside the local environment7.Choose services carefully8.Reuse existing workflows9.Advertise10. Maintain30 31. Wf4Ever toolingSneak preview 31 32. Supporting information Workflow jargon Scientific workflowParadigm to describe, manage, and share complex scientific analyses Workflow systemSoftware to design, execute, and monitor scientific workflows Module= nested workflow = workflow in a workflow = workflow component Beanshell scriptA Java-based scripting language.Typically used for data type conversions in Taverna. ProvenanceHistory or trace of a workflow run.Allows you to look at intermediate data, which workflows and serviceswere run, with what data.32