Visual Compression of Workflow Visualizations with Automated Detection of Macro Motifs
Eamonn Maguire, Philippe Rocca-Serra, Susanna-Assunta Sansone, Jim Davies and Min Chen
University of Oxford e-Research CentreUniversity of Oxford Department of Computer Science
VIS 2013, 13th-18th October 2013
Some terminology
Motif
Macro
Commonly observed subgraphs
A single instruction that expands automatically in to a more complex set of instructions.
Workflow Literally a flow of work showing the processes enacted from start to finish in say business processes, software execution, analysis procedures, or in our case, biological experiments.
They are used to enable reproducibility.
e.g VisTrails in our VIS community - 40,000 downloads
Q
Q
D
E
Q
QE
D
VIS 2013, 13th-18th October 2013
Very commonly seen used in: biology - protein-protein interaction, transcription/regulation networks; chemistry; and even visualization (e.g. VisComplete)
Roadmap
VIS 2013, 13th-18th October 2013
Roadmap
WorkflowSubstitute motifs with
‘macros’
VIS 2013, 13th-18th October 2013
AutomaticallyDetect Motifs
VIS 2013, 13th-18th October 2013
Blockades
VIS 2013, 13th-18th October 2013
Blockades
Current Motif Detection Algorithm Limitations
No semantics
Limited motif sizes (Max 10)
VIS 2013, 13th-18th October 2013
Blockades
Current Motif Detection Algorithm Limitations
No semantics
Limited motif sizes (Max 10) Deciding what should
be a Macro
Macros in electronic circuit diagrams are the product of years of refinement.
Macros in biological workflows for instance is new...how do we determine what should be a macro?
Example case
Biology
VIS 2013, 13th-18th October 2013
Taxonomy-based Glyph Design
Maguire et al, 2012IEEE TVCG
Visualizing (ISA based) workflows of biological experiments
Extension on Previous Work
VIS 2013, 13th-18th October 2013
A Typical Biological Experiment
Hypothesis Experiment Results
&
Paper
Analysis
VIS 2013, 13th-18th October 2013
material protocol chemical dataKEY
Source name
Sampling Protocol
Sample name
Chemical Label
Labeling Protocol
Labeled Extract
Hybridisation Protocol
Assay Name
Scanning Protocol
Raw Data File
Feature Extraction Protocol
Processed Data File
Describe the flow of work from a
biological sample to the data file.
Workflow varies between technologies,
but there is a large commonality in steps.
For example, the labeling step is very
common in DNA microarray experiments.
Representing an Experiment - Workflows!
Reproducibility!
VIS 2013, 13th-18th October 2013
MOTIF EXTRACTION ALGORITHM
RANKING ALGORITHM
MACRO SELECTION VIA UI
BIOLOGICAL WORKFLOW REPOSITORY
MACROSELECTION
GLYPH DESIGN MACRO ANNOTATION
2.87
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
2.4
OCCURRENCE
600
WORKFLOWS
240
COMPRESSION
2400
SELECTED MACROS
DOMAIN EXPERT
Branch & Merge
Branch & Merge
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
Branch & Merge
Branch & Merge
MACRO INSERTION IN GRAPH
MOTIFS
...
2.871
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
-2.43n
OCCURRENCE
20
WORKFLOWS
10
COMPRESSION
200
...
DOMAIN EXPERT
Our Process
VIS 2013, 13th-18th October 2013
MOTIF EXTRACTION ALGORITHM
RANKING ALGORITHM
MACRO SELECTION VIA UI
BIOLOGICAL WORKFLOW REPOSITORY
MACROSELECTION
GLYPH DESIGN MACRO ANNOTATION
2.87
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
2.4
OCCURRENCE
600
WORKFLOWS
240
COMPRESSION
2400
SELECTED MACROS
DOMAIN EXPERT
Branch & Merge
Branch & Merge
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
Branch & Merge
Branch & Merge
MACRO INSERTION IN GRAPH
MOTIFS
...
2.871
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
-2.43n
OCCURRENCE
20
WORKFLOWS
10
COMPRESSION
200
...
DOMAIN EXPERT
Our Process
VIS 2013, 13th-18th October 2013
Workflow Repository
9,670 Biological Experiment Workflows
Why such a large number?We can statistically make suggestions to users about what motifs can be macros based on a number of metrics (detailed later)
+ we can robustly test our algorithm performance across a huge cross section of experiments...
VIS 2013, 13th-18th October 2013
MOTIF EXTRACTION ALGORITHM
RANKING ALGORITHM
MACRO SELECTION VIA UI
BIOLOGICAL WORKFLOW REPOSITORY
MACROSELECTION
GLYPH DESIGN MACRO ANNOTATION
2.87
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
2.4
OCCURRENCE
600
WORKFLOWS
240
COMPRESSION
2400
SELECTED MACROS
DOMAIN EXPERT
Branch & Merge
Branch & Merge
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
Branch & Merge
Branch & Merge
MACRO INSERTION IN GRAPH
MOTIFS
...
2.871
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
-2.43n
OCCURRENCE
20
WORKFLOWS
10
COMPRESSION
200
...
DOMAIN EXPERT
VIS 2013, 13th-18th October 2013
MOTIF EXTRACTION ALGORITHM
RANKING ALGORITHM
MACRO SELECTION VIA UI
BIOLOGICAL WORKFLOW REPOSITORY
MACROSELECTION
GLYPH DESIGN MACRO ANNOTATION
2.87
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
2.4
OCCURRENCE
600
WORKFLOWS
240
COMPRESSION
2400
SELECTED MACROS
DOMAIN EXPERT
Branch & Merge
Branch & Merge
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
Branch & Merge
Branch & Merge
MACRO INSERTION IN GRAPH
MOTIFS
...
2.871
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
-2.43n
OCCURRENCE
20
WORKFLOWS
10
COMPRESSION
200
...
DOMAIN EXPERT
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
Motif Extraction Algorithm
VIS 2013, 13th-18th October 2013
The Current Weaknesses
No semantics (edge or node)Small node limit normally <10
VIS 2013, 13th-18th October 2013
Imagine n-grams with no information other than topology
e.g. bi-grams of DNA ‘motifs’ where instead of A-T, T-C, T-G > x-x, x-x, x-x
FANMOD, mFinder etc.
The Problem...Current Motif Extraction Algorithms
Ah, and you can’t have macros without function...
Exactly!
We can’t infer function from these results
What’s up?
Unable to infer function Unable to produce a macro
VIS 2013, 13th-18th October 2013
SolutionA a normal state, with
s1
s0
A
B DB
EE
G
H
a holding state, with
a pseudo-ŵŽƟĨ
a ‘’legal’’ ŵŽƟĨ
s2
s3C
C
s4
F
E
H
a starƟŶŐ state
a trĂŶƐŝƟŽŶ that
generates a ŵŽƟĨC
E
generates a ŵŽƟĨ
a trĂŶƐŝƟŽŶ does not
generate a ŵŽƟĨ
A a normal state, with
s1
s0
A
B DB
EE
G
H
a holding state, with
a pseudo-ŵŽƟĨ
a ‘’legal’’ ŵŽƟĨ
s2
s3C
C
s4
F
E
H
a starƟŶŐ state
a trĂŶƐŝƟŽŶ that
generates a ŵŽƟĨC
E
generates a ŵŽƟĨ
a trĂŶƐŝƟŽŶ does not
generate a ŵŽƟĨ
VIS 2013, 13th-18th October 2013
More detail about each individual case, A-H available in paper.
SolutionA a normal state, with
s1
s0
A
B DB
EE
G
H
a holding state, with
a pseudo-ŵŽƟĨ
a ‘’legal’’ ŵŽƟĨ
s2
s3C
C
s4
F
E
H
a starƟŶŐ state
a trĂŶƐŝƟŽŶ that
generates a ŵŽƟĨC
E
generates a ŵŽƟĨ
a trĂŶƐŝƟŽŶ does not
generate a ŵŽƟĨ
A a normal state, with
s1
s0
A
B DB
EE
G
H
a holding state, with
a pseudo-ŵŽƟĨ
a ‘’legal’’ ŵŽƟĨ
s2
s3C
C
s4
F
E
H
a starƟŶŐ state
a trĂŶƐŝƟŽŶ that
generates a ŵŽƟĨC
E
generates a ŵŽƟĨ
a trĂŶƐŝƟŽŶ does not
generate a ŵŽƟĨ
3
VIS 2013, 13th-18th October 2013
More detail about each individual case, A-H available in paper.
Resulting In... From our algorithm, running over 9,670 workflows, we retrieved ~12,000 motifs up to depth 12
VIS 2013, 13th-18th October 2013
Resulting In...
Semantically awareLimited by depth, not node count - we have motifs with > 80 nodes
From our algorithm, running over 9,670 workflows, we retrieved ~12,000 motifs up to depth 12
VIS 2013, 13th-18th October 2013
Essentially, more complicated topologically sensitive n-grams
MOTIF EXTRACTION ALGORITHM
RANKING ALGORITHM
MACRO SELECTION VIA UI
BIOLOGICAL WORKFLOW REPOSITORY
MACROSELECTION
GLYPH DESIGN MACRO ANNOTATION
2.87
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
2.4
OCCURRENCE
600
WORKFLOWS
240
COMPRESSION
2400
SELECTED MACROS
DOMAIN EXPERT
Branch & Merge
Branch & Merge
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
Branch & Merge
Branch & Merge
MACRO INSERTION IN GRAPH
MOTIFS
...
2.871
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
-2.43n
OCCURRENCE
20
WORKFLOWS
10
COMPRESSION
200
...
DOMAIN EXPERT
VIS 2013, 13th-18th October 2013
MOTIF EXTRACTION ALGORITHM
RANKING ALGORITHM
MACRO SELECTION VIA UI
BIOLOGICAL WORKFLOW REPOSITORY
MACROSELECTION
GLYPH DESIGN MACRO ANNOTATION
2.87
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
2.4
OCCURRENCE
600
WORKFLOWS
240
COMPRESSION
2400
SELECTED MACROS
DOMAIN EXPERT
Branch & Merge
Branch & Merge
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
Branch & Merge
Branch & Merge
MACRO INSERTION IN GRAPH
MOTIFS
...
2.871
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
-2.43n
OCCURRENCE
20
WORKFLOWS
10
COMPRESSION
200
...
DOMAIN EXPERT
2.871
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
-2.43n
OCCURRENCE
20
WORKFLOWS
10
COMPRESSION
200
...
Ranking Algorithm...because 12,000 is just too much.
VIS 2013, 13th-18th October 2013
Ranking Algorithm
M1 - Occurrences in data
repository
1,043
M2 -Workflow Presence
640...
M3 -Compression Potention
VIS 2013, 13th-18th October 2013
Ranking Algorithm
M1 - Occurrences in data
repository
1,043
M2 -Workflow Presence
640...
M3 -Compression Potention
VIS 2013, 13th-18th October 2013
Ranking Algorithm
M1 - Occurrences in data
repository
1,043
M2 -Workflow Presence
640...
M3 -Compression Potention
VIS 2013, 13th-18th October 2013
Ranking Algorithm
M1 - Occurrences in data
repository
1,043
M2 -Workflow Presence
640...
M3 -Compression Potention
For At, Aw and Ac, we map it to a fixed range [−1, 1] using a linear mapping based on the min-max range of each indicator, yielding three normalized metrics M1 , M2 and M3
VIS 2013, 13th-18th October 2013
No algorithm would be complete without a weighting element. So each metric can be weighted. We use a default weight of 1.
Ranking Algorithm
3 Normalized metrics Motif subgraph 3 Glyph representations
Filter by pattern presenceLinear, branching and merging
Filter by min/max depth
Motifs arranged by depth
Depth 6 motifs with magnified view in B and detailed popup of selected motif in D
VIS 2013, 13th-18th October 2013
Ranking Algorithm
3 Normalized metrics Motif subgraph 3 Glyph representations
Filter by pattern presenceLinear, branching and merging
Filter by min/max depth
Motifs arranged by depth
Depth 6 motifs with magnified view in B and detailed popup of selected motif in D
Occurrences Workflow
presence
Score Compression
Potential
VIS 2013, 13th-18th October 2013
Ranking Algorithm
3 Normalized metrics Motif subgraph 3 Glyph representations
Filter by pattern presenceLinear, branching and merging
Filter by min/max depth
Motifs arranged by depth
Depth 6 motifs with magnified view in B and detailed popup of selected motif in D
Occurrences Workflow
presence
Score Compression
Potential
Downgrade Icon
Adjusted Score
VIS 2013, 13th-18th October 2013
Ranking Algorithm
3 Normalized metrics Motif subgraph 3 Glyph representations
Filter by pattern presenceLinear, branching and merging
Filter by min/max depth
Motifs arranged by depth
Depth 6 motifs with magnified view in B and detailed popup of selected motif in D
Occurrences Workflow
presence
Score Compression
Potential
Downgrade Icon
Adjusted Score
VIS 2013, 13th-18th October 2013
1000
Ranking Algorithm
3 Normalized metrics Motif subgraph 3 Glyph representations
Filter by pattern presenceLinear, branching and merging
Filter by min/max depth
Motifs arranged by depth
Depth 6 motifs with magnified view in B and detailed popup of selected motif in D
Occurrences Workflow
presence
Score Compression
Potential
Downgrade Icon
Adjusted Score
VIS 2013, 13th-18th October 2013
Subset of
1000
1200
Ranking Algorithm
3 Normalized metrics Motif subgraph 3 Glyph representations
Filter by pattern presenceLinear, branching and merging
Filter by min/max depth
Motifs arranged by depth
Depth 6 motifs with magnified view in B and detailed popup of selected motif in D
Occurrences Workflow
presence
Score Compression
Potential
Downgrade Icon
Adjusted Score
VIS 2013, 13th-18th October 2013
Subset of
1000
1200 200
MOTIF EXTRACTION ALGORITHM
RANKING ALGORITHM
MACRO SELECTION VIA UI
BIOLOGICAL WORKFLOW REPOSITORY
MACROSELECTION
GLYPH DESIGN MACRO ANNOTATION
2.87
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
2.4
OCCURRENCE
600
WORKFLOWS
240
COMPRESSION
2400
SELECTED MACROS
DOMAIN EXPERT
Branch & Merge
Branch & Merge
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
Branch & Merge
Branch & Merge
MACRO INSERTION IN GRAPH
MOTIFS
...
2.871
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
-2.43n
OCCURRENCE
20
WORKFLOWS
10
COMPRESSION
200
...
DOMAIN EXPERT
VIS 2013, 13th-18th October 2013
MOTIF EXTRACTION ALGORITHM
RANKING ALGORITHM
MACRO SELECTION VIA UI
BIOLOGICAL WORKFLOW REPOSITORY
MACROSELECTION
GLYPH DESIGN MACRO ANNOTATION
2.87
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
2.4
OCCURRENCE
600
WORKFLOWS
240
COMPRESSION
2400
SELECTED MACROS
DOMAIN EXPERT
Branch & Merge
Branch & Merge
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
Branch & Merge
Branch & Merge
MACRO INSERTION IN GRAPH
MOTIFS
...
2.871
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
-2.43n
OCCURRENCE
20
WORKFLOWS
10
COMPRESSION
200
...
DOMAIN EXPERT
Glyph Design
VIS 2013, 13th-18th October 2013
Glyph Design
Topology/structure within a macro
Node type
Density
Annotation
Things we’d like to see...
VIS 2013, 13th-18th October 2013
Glyph Design
annotation
annotation
Node typecolour/shape
Node typecolour/shape
Length
Topologyarrangement
Breadth
Topologyarrangement
Breadth
Topologyoverall
Node typecolour
Length
Breadth
Length
annotation
VIS 2013, 13th-18th October 2013
annotation
annotation
STATE-TRANSITION MODEL EXAMPLES
Node typecolour/shape
Node typecolour/shape
Length
Topologyarrangement
Breadth
Topologyarrangement
Breadth
Topologyoverall
Node typecolour
Length
Breadth
Length
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4
F
E
H
C
E
s1
s0
AA s
1ss0 s
3
Bs1
s3C A s
1
Es4
s1
s4
F s1
s4
G
annotation
annotation
annotation
STATE-TRANSITION MODEL EXAMPLES
Node typecolour/shape
Node typecolour/shape
Length
Topologyarrangement
Breadth
Topologyarrangement
Breadth
Topologyoverall
Node typecolour
Length
Breadth
Length
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4
F
E
H
C
E
s1
s0
AA s
1ss0 s
3
Bs1
s3C A s
1
Es4
s1
s4
F s1
s4
G
annotation
MOTIF EXTRACTION ALGORITHM
RANKING ALGORITHM
MACRO SELECTION VIA UI
BIOLOGICAL WORKFLOW REPOSITORY
MACROSELECTION
GLYPH DESIGN MACRO ANNOTATION
2.87
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
2.4
OCCURRENCE
600
WORKFLOWS
240
COMPRESSION
2400
SELECTED MACROS
DOMAIN EXPERT
Branch & Merge
Branch & Merge
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
Branch & Merge
Branch & Merge
MACRO INSERTION IN GRAPH
MOTIFS
...
2.871
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
-2.43n
OCCURRENCE
20
WORKFLOWS
10
COMPRESSION
200
...
DOMAIN EXPERT
VIS 2013, 13th-18th October 2013
MOTIF EXTRACTION ALGORITHM
RANKING ALGORITHM
MACRO SELECTION VIA UI
BIOLOGICAL WORKFLOW REPOSITORY
MACROSELECTION
GLYPH DESIGN MACRO ANNOTATION
2.87
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
2.4
OCCURRENCE
600
WORKFLOWS
240
COMPRESSION
2400
SELECTED MACROS
DOMAIN EXPERT
Branch & Merge
Branch & Merge
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
Branch & Merge
Branch & Merge
MACRO INSERTION IN GRAPH
MOTIFS
...
2.871
OCCURRENCE
1092
WORKFLOWS
476
COMPRESSION
3276
-2.43n
OCCURRENCE
20
WORKFLOWS
10
COMPRESSION
200
...
DOMAIN EXPERT
Branch & Merge
Branch & Merge
Macro Insertion for Workflow Compression
VIS 2013, 13th-18th October 2013
Macro Insertion for Workflow Compression
VIS 2013, 13th-18th October 2013
Macro Insertion for Workflow Compression
A
VIS 2013, 13th-18th October 2013
Macro Insertion for Workflow Compression
A
B
VIS 2013, 13th-18th October 2013
Macro Insertion for Workflow Compression
A
B
C
VIS 2013, 13th-18th October 2013
Macro Insertion for Workflow Compression
A
B
C D
VIS 2013, 13th-18th October 2013
Evaluation
VIS 2013, 13th-18th October 2013
User Testing Performance
Evaluation
VIS 2013, 13th-18th October 2013
Evaluation
VIS 2013, 13th-18th October 2013
Evaluation
VIS 2013, 13th-18th October 2013
Evaluation
VIS 2013, 13th-18th October 2013
Community Dissemination
VIS 2013, 13th-18th October 2013
A B
Dissemination of macros to community
Automacron API available as an OSGi plugin for ISAcreator
VIS 2013, 13th-18th October 2013
Roadmap
WorkflowSubstitute motifs with
‘macros’AutomaticallyDetect Motifs
VIS 2013, 13th-18th October 2013
VIS 2013, 13th-18th October 2013
Overcoming the blockades
Current Motif Detection Algorithm Limitations
No semantics
Limited motif sizes (Max 10) Deciding what should
be a Macro
Macros in electronic circuit diagrams are the product of years of refinement.
Macros in biological workflows for instance is new...how do we determine what should be a macro?
VIS 2013, 13th-18th October 2013
Overcoming the blockades
Current Motif Detection Algorithm Limitations
No semantics
Limited motif sizes (Max 10) Deciding what should
be a Macro
Macros in electronic circuit diagrams are the product of years of refinement.
Macros in biological workflows for instance is new...how do we determine what should be a macro?
New semantically enabled algorithm
VIS 2013, 13th-18th October 2013
Overcoming the blockades
Current Motif Detection Algorithm Limitations
No semantics
Limited motif sizes (Max 10) Deciding what should
be a Macro
Macros in electronic circuit diagrams are the product of years of refinement.
Macros in biological workflows for instance is new...how do we determine what should be a macro?
New semantically enabled algorithm
Statistically informed selection fro
m
analysis of a large corpus of w
orkflows
VIS 2013, 13th-18th October 2013
Summary
New semantically enabled motif discovery algorithm
Statistically informed selection of macro candidates for use in biological workflow visualizations
Automated macro image generation from inferred from algorithm states
Integration of final selections and utility to compress in ISAcreator tool for curators and biologists alike
Open source - we want you to extend!
F
A
s1
s0
A
B DB
EE
G
H
s2
s3C
C
s4E
H
C
E
github.com/isa-tools/automacron
Philippe Rocca-SerraSusanna-Assunta SansoneJim DaviesMin Chen
Co-authors
AlsoAlejandra Gonzalez Beltran for many useful discussions
Bye.
You can download this software now!
And yes.
It is open source!
VIS 2013, 13th-18th October 2013