Workflow management: motivation and vision Ela Hunt [email protected].

31
Workflow management: motivation and vision Ela Hunt [email protected]

Transcript of Workflow management: motivation and vision Ela Hunt [email protected].

Page 1: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

Workflow management:motivation and vision

Ela [email protected]

Page 2: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

2Ela Hunt, SyBIT

Plan

Overview of existing workflows

Gains to be achieved via workflows

Methodological assumptions: how to support and construct workflows with less effort and more effectively

Page 3: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

3Ela Hunt, SyBIT

Three areas of workflow use:

Deep sequencing

High content screening

Proteomics

Future: workflows combining those three methodologies, possibly including metabolomics, NMR. etc

Page 4: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

4Ela Hunt, SyBIT

Deep sequencing

Management of reads (images) coming off the microscopy devices

Processing of images into sequence files

Aligment to a genome or genome assembly from short reads

Annnotation with data from external sources

Candidate gene/drug target identification

Page 5: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

5Ela Hunt, SyBIT

DeepSequencingWorkflow Status (Lausanne)1b. Illuminasequencing Possible extensions

6. DAS server

8. AssociationViewer

7. MicrobeBrowser

1a. Web – sample metadata capture

Perl

4. Submit analysispipeline

2. fileserver

3.

Web-browse

Sequenceanalysis

Meta-data

Sequence data

Page 6: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

6Ela Hunt, SyBIT

Deep sequencing workflow status

Lausanne – alignment via Eland (Emmanuel Beaudoing, Sylvain Pradervand)

Basel – under construction (Manuel Kohler)

Zurich – FGCZ – under construction (Remy Bruggmann)

Page 7: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

7

Proteomics workflows

MS spectra

Mapping to proteins (merging output from various analysis programs)

Annotation with additional data

ETHZ – Perl scripts and KNIME (Andreas Quandt)

Lausanne, Geneva, Basel (?)

Ela Hunt, SyBIT

Page 8: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

8Ela Hunt, SyBIT

ETHZ proteomics example (drawn in KNIME by Andreas Quandt)

Page 9: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

9Ela Hunt, SyBIT

Screening workflows

Microscopy, image transfer, compression

Matlab scripts (light intensity adjustment, feature recognition, etc, leading to the identification of features) writing feature counts to a DB/files

Stats and chart generation, sometimes including a user interface showing images (also for training), KNIME, R, Matlab, etc

Page 10: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

10Ela Hunt, SyBIT

Screening workflows

Lausanne – Petr Strnad‘s workflows in KNIME, Matlab, MySQL

iBRAIN developed by Berend Snijder - an end-to-end solution with a GUI (shell script, XML, XSLT, HTML)

imageJ in S. Maerkl‘s lab in Lausanne, needing more automation and DB

HCDC (Postgress, Matlab, KNIME)

Page 11: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

11Ela Hunt, SyBIT

Lausanne workflow fragment

Loop for every plate…Read availableplates

…read cell datafor the plate in the loop

Calculate the number of centrosomesfor 7 different threshold

Page 12: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

12Ela Hunt, SyBIT

iBRAIN overview

Purpose: plates, wells, images => compress images, classify cells into types, count cells of various types, graph

Submit project via drag-and-drop of a file

Monitor progress on cluster via HTML pages

Technology: bash, Matlab, cluster, XML, HTML, web pages generated from a bash script, paths and file names are embedded

Page 13: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

13Ela Hunt, SyBIT

iBRAIN use cases

Page 14: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

14Ela Hunt, SyBIT

OUR GOALS: addressing technical challenges

Maintainablility (extendability) of the entire workflow

Portability

Automation (end-to-end execution)

Cost savings via code base sharing

Various architectures (storage, clusters)

Multiple logins (security, ease of administration)

Privacy

Most of those can be solved via extending KNIME (next talk)

Page 15: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

15Ela Hunt, SyBIT

Extending KNIME:see workflows wiki page

Page 16: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

16Ela Hunt, SyBIT

What is KNIME?

A Java workflow management system

Integrates Python, R, Perl, Java snippets, jdbc

GUI – can be used by a bioinformatician

Also server and cluster products (SunGRID engine)

Used at several locations (below P. Strnad‘s at Lausanne)

Page 17: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

KNIME Analysis (from P. Strnad)

GFP-Centrin expression threshold

50% of cells have2 centrosomes

Usually exclude 10% of cellswith low GFP-Centrin signalPe

rcen

tage

of c

ells

bel

low

thre

shol

d

Page 18: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

KNIME Analysis

Centrosome number

Cell

coun

t

Page 19: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

Image Regions Viewer

Page 20: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

Image Regions Viewer

Page 21: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

21Ela Hunt, SyBIT

Goals of KNIME extension

Maintainablility (extendability)

Portability

Automation (end-to-end execution)

Cost savings via code base sharing

Various architectures (storage, clusters)

Doing away with multiple logins or no logins (security, ease of administration, privacy)

Page 22: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

22Ela Hunt, SyBIT

Security

Security – one uname/passw per user, one login that carries out the whole workflow

Will include cluster/db logins

KNIME – needs the concepts of user/session, login, accounting of who did what

Allows for workflow tracking, scientific repeatability, accounting

Page 23: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

23Ela Hunt, SyBIT

Distributed data and computation

Data Mover as a KNIME node (expose input params, input and output as KNIME ports) – KNIME abstracts over those, and calls them ports

Usage of clusters (LSF and others, as needed) – probably involving the spawning of several Java workflows distributed over a cluster, also reporting of status as jobs are being processed

Page 24: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

24

Language additions

Wrapping for Matlab

Improved wrapping of Perl

Better facilities for R embedding (viewports)

CP2 embedding

Sequence: Eland, MAQ, Bowtie, BWA

Proteomics: Mascot, Xtandem, OMSSA, SpectraSS

Ela Hunt, SyBIT

Page 25: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

25Ela Hunt, SyBIT

GUI additions

Job submission GUI

Job monitoring GUI (to show errors in a manner appropriate for a biological user)

Workflow sharing GUI (choose workflow, associate with data)

GUI embedding facility for Java GUIs (currently implementation is too fiddly)

Page 26: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

26Ela Hunt, SyBIT

Workflow portability

A reconfiguration tool, based on the XML workflow description format supported by KNIME, in XPath or Xquery (GUI?):

select all data paths and change them

select all software paths and change them

select db/login/cluster user data, update

check the updated values by testing all new parameters, report

for two identical workflow instances, report the config differences

Page 27: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

27Ela Hunt, SyBIT

Better workflow management

An open repository of workflow nodes, shared by all KNIME user groups (two parts – mature and beta)

Saving of graphing parameters, so that an entire workflow can be automated

Adding a workflow start node with iteration over directories

Data flow efficiency - data exchange between nodes – via hierarchical structures (XML?) and tables (for Perl?)

Page 28: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

28Ela Hunt, SyBIT

Image handling

Image type improvements (this type is under development and may not be mature yet)

Image storage in openBIS (various levels of resolution, by well, plate, etc), with associated indexes, so that stats at various levels can be generated easily

Page 29: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

29Ela Hunt, SyBIT

openBIS/B-Fabric connectivity

Access to raw data from KNIME

Image indexing, so that KNIME can effectively query features

Analysis results storage

Dumping of workflow run parameters/outcomes to DB (maybe picking up a workflow from DB)

Page 30: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

30Ela Hunt, SyBIT

SQL handling

Better table merging (to merge data from several tables, supported by a query definition), as this is cumbersome

Page 31: Workflow management: motivation and vision Ela Hunt Ela.Hunt@SystemsX.ch.

31Ela Hunt, SyBIT

Summary

KNIME is used in Zurich and Lausanne, but does not provide end-to-end processing

List of new requirements was gathered from workflow users

An outline grant submitted to KTI

Your input is needed!