ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ...
Transcript of ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ...
ORNL is managed by UT-Battelle for the US Department of Energy
ORNL Science DMZ and Bridging CADES Workflows
Compute and Data Environment for Science (CADES)
Advanced Data Workflow GroupRyan Prout
2 Presentation name
Goals of The Presentation
• CADES overview• SDMZ Architecture• Workflows and Projects• Future
3 Presentation name
CADES Resources
• OpenStack Cloud– 16 Hypervisors (1,024 VCPU’s, 2TB Memory, 20.5 TB Storage) – Lustre host aggregate– Birthright for the lab– Expanding quickly
• Compute• Storage
– Lustre, NFS, Scality (Research Data Archive)
• DTNs and SDMZ• Workflow design and support
4 Presentation name
CADES Deployment
OIC
Cray Condos
CADES Moderate
CADES Open
Hybrid Cloud
Unique Heterogeneous Platforms
Large-Scale Storage
PHI Enclave
High-Speed Interconnects
• ~6000 Cores of Integrated Condos on Infiniband• ~5000 Cores of Hybrid, Expandable Cloud • SGI UV, Urika-GD/XA: GX• 5PB+ High-Speed Storage• ~3000 Cores of XK7
• ~5000 Cores of Integrated Condos on Infiniband• ~10,000 OIC Cores• Attested PHI Enclave• Integrated with UCAMS and XCAMS
.. and several other smaller projects... and several ORNL projects on OIC
Object store
5 Presentation name
Science DMZ roadmap
6 Presentation name
ORNL SDMZ Advantages
• Create advanced workflows– CrossBOW Project– ARM<->CADES<->NCCS/OLCF
• High performance• Scalability• Internal and External collaborations• Scientific workflow systems
7 Presentation name
Bridging workflows through SDMZ
• ARM <-> CADES <-> NCCS/OLCF– Atmospheric Radiation Measurement Climate Research Facility– Phase I: globus-url-copy for data movement and automation– Phase II: Globus APIs and application
• CrossBOW Project (Cross-platform Big Data Operational Workflows)– Globus APIs and CKAN server with CrossBOW API– Focus on deep learning workflows– Challenge of automating and scheduling of analysis tasks– https://ramanathanlab.org/cosc526/
8 Presentation name
ARM Resource Overview
9 Presentation name
Phase I: ARM Workflow and Connection Types
SSHFTP
ARM-DTN
CADES-ARM-DTN
OpenDTN
HPSS-DTN
StratusLogin
CumulusLogin
DataCADES
Compute/StorageCompute Jobs
storagemount
OLCFCompute/Storage
GSIFTP
storage mount
Compute Jobs
long term
storagestorage mount
Science DMZ
10 Presentation name
Phase II: ARM Workflow
• Start working towards the utilization of Globus APIs
• Shared Endpoints
• Integrate workflow portals
• Use the Phase I time to better understand processes and needs
11 Presentation name
CrossBOW: Cross-platform Big data Operational Workflow
Front-end portalhttp://data.ornl.gov
• Possible connections to ORNL (i.e., registered OLCF) users
• XCAMS integration• Scientific datasets
CKAN repository
• File pointers within Lustrefile system as URIs
• Access based on OLCF / CADES policies
LUSTRE PFS
Workflow Scheduler
METIS
DGX-1
Spark VMs
RHEA
TITAN
Model Cache
Model Zoo
Intermediate Results
DL-Specific
ML-Specific
ORNL LDRD 8279
12 Presentation name
Inside CrossBOW
Data Manager
Resource Manager
Model Manager
Visualization Manager
CKAN repository
HPFS Spark VMsRHEA
HyperoptParameter Manager
Spearmint
Resource Listener Resource Listener
New data availableSchedule runner Optional spawn Spark
Schedule parameter sweep
Fetch model
Model Cache
Model Zoo
Intermediate Results
Prel
oad
mod
el
Schedule runner
CKAN Web-service
CKAN Web-service
Resource URIs
Ove
rlap
with
SW
IFT
13 Presentation name
Grid and Cloud Engine
Catania Science Gateway Framework
http://csgf.readthedocs.io/en/latest/grid-and-cloud-engine/docs/index.html
Similar to CrossBOW in the ”Engine” piece
14 Presentation name
VA Data Transfer – Genomics Research
• Private 10G circuit for data transfer– Globus-url-copy between sites (not currently allowed to talk with Globus)– Private SDMZ
• Cloud inftrastructure for researchers– Big Data – Spark Cluster– VMs– Science Gateway
15 Presentation name
Future Work
• “Beef up” ORNL SDMZ infrastructure• Boost ORNL SDMZ project usage• Collaboration on SDMZ workflow systems• Investigate Globus API building blocks and portal integration• Create abstracted cross infrastructure environment to enable easy
workflow automation• Make Data sharing easy between environments• Private SDMZ – Medical SDMZ?
16 Presentation name
Credit to others
Susan Hicks (CADES)Jason Anderson (OLCF)Dustin Leverman (OLCF)Anthony Clodfelter (ARM)Rob Records (ARM)Arvind Ramanathan (CrossBOW)