Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components...
Transcript of Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components...
![Page 1: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/1.jpg)
Dr. Robert W. Wisniewski
Chief Software Architect Extreme Scale Computing, Intel
November 15, 2017
![Page 2: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/2.jpg)
Session Agenda and Objective
• OpenHPC• Value
• Goals
• Background
• Update
• Intel’s component work planned for submission to OpenHPC
![Page 3: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/3.jpg)
Value from Community
• Stable HPC System Software that:
Fuels a vibrant and efficient HPC software ecosystem
Removes duplication of effort throughout community
Simplify the complexity of installation, configuration, and ongoing maintenance of a custom software stack
Takes advantage of hardware innovation and drives revolutionary technologies
Eases traditional HPC application development and testing at scale
Development environment for new workloads (ML, analytics, big data, cloud)
A Shared Repository
![Page 4: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/4.jpg)
OpenHPC - Mission and VisionMission: to provide a reference collection of open-source HPC software components and best practices, lowering barriers to deployment, advancement, and use of modern HPC methods and tools.
Vision: OpenHPC components and best practices will enable and accelerate innovation and discoveries by broadening access to state-of-the-art, open-source HPC methods and tools in a consistent environment, supported by a collaborative, worldwide community of HPC users, developers, researchers, administrators, and vendors.
Recent article by Adrian Reber, OpenHPC TSC and Red Hat: https://opensource.com/article/17/11/openhpc 4
![Page 5: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/5.jpg)
OpenHPC: a brief History...
5
ISC’15BoF on the merits/interest in a community effort
ISC’16v1.1.1 release,Linux Foundation announces technical leadership, founding members, and governance
SC’15Initial v1.0 release, gather interested parties to work with Linux Foundation
SC’16v1.2 release, BoF
ISC’17v1.3.1release,BoF
June 2015
Nov 2015
June 2016
Nov 2016
June 2017
Nov 2017
SC’17V1.3.3 release, BoF
![Page 6: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/6.jpg)
6
Current Project Members
Member participation interest? Please contact Jeff ErnstFriedman
Mixture of academics, labs, and industry
WWW.OpenHPC.Community
![Page 7: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/7.jpg)
OpenHPC Stack Overview
![Page 8: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/8.jpg)
OpenHPC v1.3.3 - Current S/W components Functional Areas Components
Base OS CentOS 7.4, SLES12 SP3
Architecture aarch64, x86_64
Administrative ToolsConman, Ganglia, Lmod, LosF, Nagios, pdsh, pdsh-mod-slurm, prun, EasyBuild, ClusterShell, mrsh, Genders, Shine, Spack, test-suite
Provisioning Warewulf, xCAT
Resource Mgmt. SLURM, Munge, PBS Professional, PMIx
Runtimes OpenMP, OCR, Singularity
I/O Services Lustre client, BeeGFS client
Numerical/Scientific Libraries
Boost, GSL, FFTW, Hypre, Metis, Mumps, OpenBLAS, PETSc, PLASMA, Scalapack, Scotch, SLEPc, SuperLU, SuperLU_Dist, Trilinos
I/O Libraries HDF5 (pHDF5), NetCDF/pNetCDF (including C++ and Fortran interfaces), Adios
Compiler Families GNU (gcc, g++, gfortran), Clang/LLVM
MPI Families MVAPICH2, OpenMPI, MPICH
Development Tools Autotools, cmake, hwloc, mpi4py, R, SciPy/NumPy, Valgrind
Performance Tools PAPI, IMB, mpiP, pdtoolkit TAU, Scalasca, ScoreP, SIONLib
![Page 9: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/9.jpg)
Basic Cluster Install Example
• Starting install guide/recipe targeted for flat hierarchy
• Leverages image-based provisioner: Warewulf or xCAT
• PXE boot (stateful or stateless)
• Optionally connect external Lustre* or BeeGFS parallel file system
• Need hardware-specific information to support (remote) bare-metal provisioning
![Page 10: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/10.jpg)
Target System Design
10
Large systems have a considerable number of Service Nodes (SNs)
– SMS – System Management Server
– Row/Rack Controllers
– I/O Nodes (IONs)
– Specialized servers for Fabric Manager (FM), Workload Manager/Resource Manager (RM), Database (DB)
For flexibility, the control system will target a “pool” of service nodes
– i.e., the control system has a mechanism to prefer service nodes for efficiency, affinity, or for necessary characteristics when assigning a particular function
– But, the control system has flexibility so it can spread work over many nodes for performance and resiliency
For max application performance, Compute Nodes (CNs) are avoided for execution of control system software
– Reduce noise; reduce footprint overhead in CNs
– Can leverage CNs when noise is ok or expected
– job control operations (start, kill, etc)
– activity between jobs
This approach encourages Scalable Unit packaging concepts
![Page 11: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/11.jpg)
Focus on Core System Software Components
• mOS– Scalable operating systems
• Unified Control System– Unified, Productive (single pane of glass), Reliable
• DAOS– Distribute Asynchronous Object Store
• MPI– Scalable, high performance, topology optimized
• GEOPM– Global Extensible Open Power Manager
• PMIx– Process management with “Instant On”
11
![Page 12: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/12.jpg)
12
mOS High-Level Architecture
• LWK performance for HPC applications
• Nimble to adapt to new technology
• Linux compatibility
• Better contained containers
![Page 13: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/13.jpg)
systemmanifest
Data Access Interface
DAI Framework(SCON, data tier access, daemon mgmt,resilient workitems, etc)
Provider
Security
Provider
Provisioner
Provider
Fabric Manager
Provider
Low-Level Control
Provider
Monitor
Provider
Workload Manager
Provider
Inventory
Provider
Service
Provider
Alerts
Provider
RAS
13
Operator Interface
(Web & REST)
SLURM Warewulf OPA FM Actsys SensysSCON
Service
Pro
vid
er
Opera
tor
Inte
rface
Provider
SCON
Provider
Master
offline tier(ELK/Splunk)
Underlying ComponentsGuided by DAI providers, but generally operate autonomously
bmcs
pduscdus
rectifiers
Inventory
racks
drawerschassis
bladesboards
switches cpus
cablescoresmemory
devices
nodes
Configuration
OS images
softwarecomponents
temperature
Environmental
voltage currentcoolantflow
RAS
events
alerts
jobs
Operations
provisioning
serviceops
Unified Control Systemsystem data
Service
Inventory
Env Data
Alerts
RAS
online tier(VoltDB)
nearline tier(PostgreSQL)
Unified Control System• Provide a comprehensive system view
• Advance the state-of-the art
• Build upon existing work
• Support the system lifecycle
![Page 14: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/14.jpg)
•Scale-out object store designed from the ground up for nextgen storage & fabric technologies
– High throughput/IOPS @arbitrary alignment/size
– Byte addressable for better application scalability
– no read-modify-write or false block sharing
– OS bypass with lightweight client/server
– Small memory footprint & low CPU usage
•Advanced storage API
– New scalable storage model suitable for both structured & unstructured data
– Non-blocking data & metadata operations
– Metadata & data query/indexing
Distributed Asynchronous Object Storage
DAOSOpen Source Apache 2.0 License
Traditional HPC Apps
NetCDF
Spark RDD
SCRFTI
VeloC
MPI-IO
(No)SQL
Big Data & AIApps
HDF5
Ext
POSIXI/O
ApacheArrow
Dataspaces
•Software-defined storage platform
– Flexible storage provisioning
– Predictable performance/capacity
– Ease of management
•Seamless Integration with Lustre
– Single unified namespace
NVRAM NVMeNVMeHDD NVMeNVRAMArgobotsThread model
MercuryFunction shipping
Ext
![Page 15: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/15.jpg)
15
MPICH-OFI
• Open-source implementation based on MPICH
• Uses the new CH4 infrastructure• Co-designed with MPICH community
• Targets existing and new fabrics via next-gen Open Fabrics Interface (OFI)• Ethernet/sockets, Intel® Omni-Path, Cray Aries*, IBM BG/Q*, InfiniBand*
• Improving Performance• Topology-aware collectives and process mapping
• Optimizations for newer networks
• Thread performance enhancements
• Support for automated configuration
• Added support for latest libfabric 1.5 features
![Page 16: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/16.jpg)
16
Runtime for in-band power management and optimization On-the-fly monitoring of hardware counters and application profiling
Feedback-guided optimization of hardware control knob settings
Open source software (flexible BSD three clause license)
Extensible and portable through plugin architecture Enables portability beyond x86 architectures (truly open)
Enables rapid prototyping of new power management strategies
Accommodates the reality that different sites have different constraints and preferences for performance vs. energy savings
Designed for holistic optimization across a whole job Job-wide global optimization of HW control knob settings
Application-awareness for max speedup or energy savings
Scalable via distributed tree-hierarchical design, algorithms
MPI Comms Overlay Shared Mem Region
Power-AwareRM / Scheduler
GEOPM Controller
SHM
GEOPM
GEOPM Root
GEOPM Aggregator
GEOPM Aggregator
GEOPM Leaf
Msr-safe (or Other Drivers for Non-x86 Platforms)
MSR
MPI Ranks0 to i-1
GEOPM Leaf
Processor
MPI Ranksi to j-1
Processor
MPI Ranksj to k-1
GEOPM Leaf
Processor
MPI Ranksk to n-1
GEOPM Leaf
Processor
Project url: http://geopm.github.io/geopm
Contact: [email protected]
![Page 17: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/17.jpg)
OMPISpectrumOSHMEM
SOSPGASothers
What is PMIx?
PMI-1 PMI-2
wireup supportdynamic spawn
keyval publish/lookup
MPICHyears go by…
SLURM
ALPS
RM
PGASothers
2015
Exascale systemson horizon
Launch times longNew paradigms
2016
Exascale launchin < 10s
Orchestration
PMIx v1.2
SLURMJSM
RM
OMPISpectrumOSHMEM
2017
Exascale launchin < 30s
PMIx v2.x
SLURMJSM
others
RM
![Page 18: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/18.jpg)
Three Distinct Entities
• PMIx Standard
Defined set of APIs, attribute strings
Nothing about implementation
• PMIx Reference Library
A full-featured implementation of the Standard
Intended to ease adoption
• PMIx Reference Server
Full-featured “shim” to a non-PMIx RM
![Page 19: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/19.jpg)
Backup
• Backup
![Page 20: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/20.jpg)
Hierarchical Overlay for OpenHPC software
![Page 21: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/21.jpg)
OpenHPC Development Infrastructure
• The ‘usual’ software engineering stuff:• GitHub (SCM and issue tracking/planning)
• Continuous Integration (CI) Testing (Jenkins)
• Documentation (Latex)
• Capable build/packaging system• At present: we target a common
delivery/access mechanism that adopts Linux sysadmin familiarity
• Require Flexible System to manage builds
• A system using Open Build Service (OBS) supported by back-end git
![Page 22: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/22.jpg)
OpenHPC Build System: OBS
• Manage Build Process
• Drive Builds for multiple repositories
• Repeatable builds
• Generate binary and src rpms
• Publish corresponding package repositories
• Client/server architecture supports distributed build slaves and multiple architectures
![Page 23: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/23.jpg)
OpenHPC Integration/Testing/Validation
• Install recipes
• Cross-package interaction
• Development environment
• Mimic use cases common in HPC deployments
• Upgrade mechanism
![Page 24: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers](https://reader030.fdocuments.us/reader030/viewer/2022040705/5e03f6cafa3aaa3e1c250d44/html5/thumbnails/24.jpg)
OpenHPC Integration/Test/Validation
• Standalone integration test infrastructure
• Families of tests that could be used during:• Initial install process (can we build a system?)• Post-install process (does it work?) • Developing tests that touch all of the major components (can we
compile against 3rd party libraries, will they execute under resource manager, etc.?)
• Expectation is that each new component included will need corresponding integration test collateral
• Integration tests are included in the GitHub repo
• Global testing harness includes a number of embedded subcomponents