What do we need to manage end-to-end scientific workflows for
efficiency and productivity?
Lavanya Ramakrishnan
1
147MB 147MB
206MB
Wrf Static
Terrain PreProcessor
Lateral Boundary
Interpolator
ADAS Interpolator
ARPS2WRF
WRF
4secs
338secs
146secs
78secs
240secs
4570secs/ 16 processors
0.2MB
488MB 243MB
19MB
0.2MB
2422MB
(CS Biased) View of Workflow Challenges: LEAD North American Mesoscale (NAM) forecast (2009)
• Mostly simple workflow
• Repetitive, iterative, parametric studies
• Provenance • Workflow and data
sharing • Mix of single core &
multiple core jobs managed by a service based arch People use ad-hoc scripts, keep
notes in text files and encode metadata in file names
DDM: Dula’s Data Management (2012)
Way too many hard drives Multiple hours per week spent manually copying/erasing data EVERY BEAMLINE FOR ITSELF!
Advanced Light Source
But haven’t we solved the workflow and data management problems?
What about all the workflow tools out there?
Downloading and pre-processing data can be complex
5
NASA FTP Server
(Metadata and data)
Batch Queue Nodes
Data Transfer Nodes
FTP Queue
Job Queue
HPSS File System
Local Metadata Catalog
Output data
Metadata
Access re-projected data
Determine files to download for a tile
Create a file download record
Create a job record
http://modis.nersc.gov
Data Collection (High Frequency and Meteorological
Data)
Pre-Processing and Sensor Calibration
Processing into Fluxes
Post-Processing and Product Generation
Synthesis Studies, Models, Simulations
QA / QC Flagging / Visual Checks
Ustar Calculation and Filtering
Gapfilling
Flux Partitioning
Generation of Data Products
Ameriflux: Atmosphere-Biosphere interactions in Climate Models Data Processing Pipeline for Net Ecosystem Exchange (NEE)
The structure of the workflow does not capture data complexities
http://ameriflux.lbl.gov
Networking paths are still pretty complex: DayaBay Networking
• Relay Path requires daisy-chaining 2 data transfers (default)
• Direct Path requires 2 data transfers out of site. • DayaNet: Dedicated 150 Mbps optical link
• CSTNet: Chinese national network
• GLORIAD: Trans-Pacific scientific network (NSF)
• ASGC: Trans-Pacific eScience network (fallback for GLORIAD)
• ESNet: US national Energy Science Network
• Hot-swappable disk transport between Daya Bay and Hong Kong in case of long-term network failure of either DayaNet or CSTNet
7 http://dayabay.ihep.ac.cn/
Failures of all magnitudes can occur • Relay Path requires daisy-chaining 2 data transfers (default)
• Direct Path requires 2 data transfers out of site. • DayaNet: Dedicated 150 Mbps optical link
• CSTNet: Chinese national network
• GLORIAD: Trans-Pacific scientific network (NSF)
• ASGC: Trans-Pacific eScience network (fallback for GLORIAD)
• ESNet: US national Energy Science Network
• Hot-swappable disk transport between Daya Bay and Hong Kong in case of long-term network failure of either DayaNet or CSTNet
8
Framework
Beamline User
image
Data Transfer
Node Experiment HPC Storage and Archive
HPC Compute Resources
metadata
reference Data Pipeline
Remote User
Control and Feedback
Prompt Analysis Pipeline
Real-time data processing presents additional challenges
How are we going to manage simulation and analysis workflows? (NERSC Cori System)
Cray Cascade- 64 Cabinets KNL Compute 9304 Nodes
2 BOOT 2 SDB Network Nodes (4)
Boot Cabinet With esLogin
GigE Switch
GigE (Admin)
FDR IB 40 GigE
GigE (Internal)
FC
Boot RAID
FC Switch
SMW
Network RSIP (10)
esLogin 14 Nodes
esMS 2 Node
40 GigE Switch
136 FDR IB
DVS Server Nodes (32)
MOM 28 Nodes
DSL 32 Nodes
To NERSC network
To NERSC network
Haswell Compute 1920 Nodes
Redundant Core IB Switches
15 FDR IB 102 FDR IB
PFS
12 Scalable Units
MGS – 2 Nodes
MDS – 4 Nodes
OSS – 96 Nodes
DDN WolfCreek
5 NetApp E2700
Burst Buffer 384 Nodes
68 LNET Routers
32 FDR IB to NGF
It takes a village to build and run pipelines …
• Application Scientists (and Students and Postdocs) – Know the use cases – Write application codes – Often write some scripts – User who runs the final workflow
• Workflow Developers – Composes workflows
• System Integrators – Write middleware/software pieces – Computational model experts
• System/Facility Experts – HPC center and ESnet staff
Not nearly well-defined roles
Tigres: a workflow system “library”/”toolkit”
What do we need for end-to-end pipelines?
• Workflow tools need to be more than what runs on clusters/HPC – Data Transfer, Data Processing and Storage Environment – Integral to scientific processes
• User/project requirements are changing and users have to play a key role in tool development – Scientists should focus on science/algorithm – Not practical to have a workflow developer for every scientist
• We need to think beyond our individual boxes – data, workflow, network, resource management has to happen
in conjunction • Efficiencies (performance, energy, …), reliability and
productivity are only becoming more important
Even CS folks need workflow tools ( e.g., Prabhat)!
Tigres: Design templates for common scientific workflow patterns
"LightSrc" Domain templates
Base Tigres templates
Scale up
Application "LightSrc-1"
Application "LightSrc-2"
Create andDebug
Share
Create andDebug
Implement templates as a library in an existing language
Early (friendly) release is now available! http://tigres.lbl.gov
Key Aspects of Tigres
• Targeted for large-scale data-intensive workflows – Motivated by “MapReduce” model – No centralized managed model
• Library model embedded in existing languages such as Python and C – “Extend current scripting/programming tools” – API-based, embedded in code
• Light-weight execution framework – “As easy to run as an MPI program on an HPC resource” – No persistent services
• Scientist-Centered Design Process – Get feedback from user before writing all the code
Tigres Templates
TaskN
Task1 Sequence
Taskn Task1 ... ...
Split
Parallel
TaskN Task1
Task
Merge
Tasko
Taskn Task1
Templates
• Sequence ( name, task_array, input_array ) – e.g., output [ ] = Sequence (“my seq”, task_array_12,
input_array_12) • Parallel ( name, task_array, input_array )
– e.g., output[ ] = Parallel(“abc”, task_array_12, input_array_12)
• Split ( name, split_task, split_input_values, task_array, task_array_in ) – e.g., Split( “split”, task_x1, input_value_1, spl_t_arr,
spl_i_arr) • Merge ( name, task_array, input_array, merge_task,
merge_input_values) – e.g., Merge( “merge”, syn_t_arr, syn_i_arr, task_x1,
input_value_1)
Design
Execution Environment
API Implementation Optimizations Scientist-Centered
Design Process
Scientist Centered Design Process
Requirements gathering is not the same as usability studies
Concept understanding by user Changes to Nomenclature Support in C also important
Priorities for first prototype: Desktop to NERSC Monitoring Intermediate state management
Impact of Scientist-Centered Design
Design
Execution Environment
API Implementation Optimizations Scientist-Centered
Design Process
It took days for first stub implementation rather than weeks
(or months)!
Summary
• User-centered design processes are vital for development of next-generation tools
• We need libraries/toolkits that can be customized for specific needs
• Next-generation workflow/software ecosystems need to be holistic
Acknowledgements
• Tigres Team – Deb Agarwal (PI), Lavanya Ramakrishnan, Daniel Gunter – Gilberto Pastorello, Valerie Hendrix, Ryan Rodriguez
• NERSC/CRD – John Shalf, Nicholas Wright, Christopher Daley
• Science Use Cases – Daniel Chivers, John Kua, Michael Quinlan – Craig Tull, Dula Parkinson, Gilberto Pastorello – …and many others
Top Related