Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH...
-
Upload
phillip-winn -
Category
Documents
-
view
215 -
download
1
Transcript of Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH...
![Page 1: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/1.jpg)
Vision for System and Resource Management
of the Swiss-Tx class of Supercomputers
Josef NemecekETH Zürich & Supercomputing Systems AG
![Page 2: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/2.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 2
Agenda
The Supercomputer Lifecycle then and now
The Swiss-T1 Management SW: COSMOSCommodity Supercomputer Management Operating System The goals of COSMOS The concept of COSMOS Implementation of COSMOS
Software Integration with existing Parts Roadmap of COSMOS
![Page 3: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/3.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 3
Supercomputers – Then and Now
Development by vendor Hardware was hand-made Software was tailored for hardware
Customers just had to orderout of the vendor’s catalogue
Test ManageNeed Order
$$$
![Page 4: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/4.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 4
Supercomputers – Then and Now
System looks like a puzzle Commodity parts, multiple vendors Zoo of interacting software components
Individual system management Millions of lines of code (scripts,
daemons)
Simulation ManageThought Design
Architecture
Topology
Needs
Specification
$$$ & t
![Page 5: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/5.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 5
COSMOS – Goals
Integrated management for whole lifecycle Design the supercomputer on-line Simulate the supercomputer performance on-line Build the designed and simulated supercomputer Manage the built supercomputer
Complete run-time system management Fault-tolerance on all (or most) system levels Remote manageability of the whole supercomputer Low run-time overhead for the system management
![Page 6: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/6.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 6
COSMOS – Supercomputer Design
Architecture selection SAN technology Nodes technology
Topology selection Every topology has it’s +/–
Resource usage Cost of the supercomputer Space, electrical power
Performance estimation
![Page 7: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/7.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 7
COSMOS – Supercomputer Design
Architecture selection SAN technology Nodes technology
Topology selection Every topology has it’s
+/–
Resource usage Cost of the supercomputer Space, electrical power
Performance estimation
![Page 8: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/8.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 8
COSMOS – Supercomputer Design
Architecture selection SAN technology Nodes technology
Topology selection Every topology has it’s +/–
Resource usage Cost of the
supercomputer Space, electrical power
Performance estimation
![Page 9: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/9.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 9
COSMOS – Supercomputer Design
Architecture selection SAN technology Nodes technology
Topology selection Every topology has it’s +/–
Resource usage Cost of the supercomputer Space, electrical power
Performance estimation
![Page 10: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/10.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 10
COSMOS – Goals
Single-system view of whole system Allows one-point system management Allows remote system management
High availability of the system management Allows high over-all system up-times Allows dynamic configuration changes
Modular software design System-independent concept & design Interfaces to existing management software modules
![Page 11: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/11.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 11
COSMOS – Concept
Configuration Control the system
Monitoring Observe the system
Planning When? Who? What?
Security Stability & independence
Faults & Traps Help the system
Accounting Charge the usage
Complete, integrated system managementRemote management from everywhere
No administrative programming necessary
![Page 12: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/12.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 12
COSMOS – ImplementationS
yste
m M
an
ag
em
en
t
Node Management
SAN Management
Process Management
Resource Management
Storage Management
LAN Management
User Interface
State control and monitoringof the nodes, accounting
SAN-dependent managementand monitoring, accounting
Support of and co-operation with parallel environments as MPI/FCI
Resource management:Priorities, allocation, queues
Vendor-dependent storagemanagement software
SNMP-based management ofused LAN components
User-privilege-basedmanagement and monitoring
![Page 13: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/13.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 13
COSMOS – Implementation
Management Center
COSMOS Center
Node 0
COSMOS Agent
Process 0
Node 1
COSMOS Agent
Node 3
COSMOS Agent
Node 2
COSMOS Agent
Process 1
Process 2
Process 3
Process 4
Process 5
Process 6
Process 7
Management Center
COSMOS Center
Management Center
COSMOS Center
![Page 14: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/14.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 14
Gridware GRD/Codine
Powerful resource management Integrates resource and batch management Ticket-based job scheduling scheme Well-defined interfaces
Some drawbacks at this moment GRD/Codine is not topology-aware GRD/Codine is a commercial product
![Page 15: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/15.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 15
COSMOS – Interaction with GRD/Codine
Syste
m M
an
ag
em
en
t
Node Management
SAN Management
Process Management
Storage Management
LAN Management
User Interface
GR
D/C
od
ine
Node Monitoring
Process Monitoring
Resource Management
User Interface
Accounting
Resource Management
![Page 16: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/16.jpg)
09.03.2000 SOS Workshop 2000 (New Orleans, LA) 16
Roadmap of COSMOS Development
Prototype release plan for COSMOS 1Q2000 – Centralised process and SAN
management 2Q2000 – Distributed system management
framework 3Q2000 – Complete non-interactive management 4Q2000 – Complete interactive management
Interaction between COSMOS & GRD/Codine Transfer of topology and configuration information Exchange of monitoring information
![Page 17: Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5517965b55034645368b57d2/html5/thumbnails/17.jpg)
Vision for System and Resource Management
of the Swiss-Tx class of Supercomputers
Josef NemecekETH Zürich & Supercomputing Systems AG