Mapping metabolic data to genetic information “ Metabolomics” “Metabonomics” Simon C Thain
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
-
Upload
jessica-blankenship -
Category
Documents
-
view
220 -
download
5
Transcript of Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
![Page 1: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/1.jpg)
Building a Real WorkflowThursday morning, 9:00 am
Greg Thain <[email protected]>
University of Wisconsin - Madison
![Page 2: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/2.jpg)
2012 OSG User School
Overview
• From script… Pragmatics Estimation
• To production
2
![Page 3: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/3.jpg)
2012 OSG User School
From schematics…
3
![Page 4: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/4.jpg)
2012 OSG User School
… to the real world
4
![Page 5: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/5.jpg)
2012 OSG User School
It starts with a script..
#!/bin/sh
# Comment
while read n
do
md5sum $n
done > output
5
![Page 6: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/6.jpg)
2012 OSG User School
Scriptify as much as possible
• What is the minimal number of manual steps?
• Even 1 might be too many.
• Zero is perfect!
6
![Page 7: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/7.jpg)
2012 OSG User School
First run locally:To measure usage
• Did you get all the inputs?• Did it run correctly?
Are you sure?
• Run once remotely NOT ON SUBMIT MACHINE! Might be surprised
• Once working, run a couple of times• If big variance, should you take the…
Average? Median? Worst case?
7
![Page 8: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/8.jpg)
2012 OSG User School
Estimations:Orders of Magnitude
• Don’t sweat the small stuff AMD == Intel == 2.4 Ghz == 3.6 Ghz
• But pay attention to the big stuff:• What makes this hard?
From nanoseconds to terabytes
• Powers of Ten, by the Eames
8
![Page 9: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/9.jpg)
2012 OSG User School
Resources Jobs Need
• CPU Wall Clock vs. CPU cycles
• Disk Working (execute side) Total (submit side) Bandwidth
File transfer queue time
• Network bandwidth Usually for file transfer only
9
![Page 10: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/10.jpg)
2012 OSG User School
User Log shows all
005 (2576205.000.000) 06/07 14:12:55 Job terminated.
(1) Normal termination (return value 0)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total RemoteUsage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
5 - Run Bytes Sent By Job
104857640 - Run Bytes Received By Job
5 - Total Bytes Sent By Job
104857640 - Total Bytes Received By Job
Partitionable Resources : Usage Request
Cpus : 1
Disk (KB) : 125000 125000
Memory (MB) : 30 100
10
![Page 11: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/11.jpg)
2012 OSG User School
High Throughput
• What Isn’t High Throughput? Quick starting jobs Very short job runtimes Micro optimizations
• What is? Constant job pressure Many jobs Long total workflow times
11
![Page 12: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/12.jpg)
2012 OSG User School
Rules of Thumb for OSG Jobs
• CPU (single-threaded) Use between 5 minutes and 2 hours wall
Upper limit somewhat soft
• Disk Keep scratch working space < 20 Gb Minimize OSG_APP_DATA usage Submit disk usage depends on machine Intermediate needs vs Sinks/Sources
12
![Page 13: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/13.jpg)
2012 OSG User School
Rules of Thumb (cont.)
• Network Primarily file I/O Fill in FILE SIZE / RUN TIME HERE
• Use squid caching where appropriate
13
![Page 14: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/14.jpg)
2012 OSG User School
How to apply rules of Thumb
• Not always clear cut• Need to keep rules of thumb in mind
• May need to join many short processes Easy with shell wrapper But careful with error conditions
• Or break up one big one…
14
![Page 15: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/15.jpg)
2012 OSG User School
Golden Rules for DAGs
• Beware of the shish kabob!• Use a single user log• Use PRE and POST script• RETRY is your friend• DAGs of DAGs are good
15
![Page 16: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/16.jpg)
2012 OSG User School
Do you have the full DAG
• Are there manual steps (esp. at beginning and end) that could be automated?
16
![Page 17: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/17.jpg)
2012 OSG User School
Start with “Conceptual DAG”
17
A
B1
D
B2 B3
C1 C2 C3
![Page 18: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/18.jpg)
2012 OSG User School
Map to Actual workflowbased on resource usage
18
A
B1
D
B2 B3
C1 C2 C3
![Page 19: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/19.jpg)
2012 OSG User School
Two transforms
• Merging nodes
• Splitting nodes
19
![Page 20: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/20.jpg)
2012 OSG User School
Merging is easy
• Scripting Avoids transfer of intermediate files Careful with error conditions Debugging can be a bit tricky
20
![Page 21: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/21.jpg)
2012 OSG User School
Breaking up is hard to do…
Ideally into parallel jobs
not always possible
Often need to checkpoint
Standard universe can help
User-defined checkpointing
Checkpoint images can be hard to manage
21
![Page 22: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/22.jpg)
2012 OSG User School
Putting it all together
• Should have one functional dag• With appropriate run times
22
![Page 23: Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f005503460f94c16a9b/html5/thumbnails/23.jpg)
2012 OSG User School
Questions?
• Questions? Comments? Feel free to ask me questions later:
Upcoming sessions
23