LLNL-PRES-770721This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
New Corona System & CTS-2 UpdateMarch 2019 LC User Meeting
Matt LeiningerCTS-2 POC
March 28, 2019
2LLNL-PRES-xxxxxx
Corona is a Follow-on to Catalyst:First AMD GPU Cluster for HPC, ML, and Data Science
GbE Management Net
164 2-Socket 24-Core Compute Nodes + 328 GPUs
ManagementNode
LoginNode
Lustre Parallel File SystemMD MD…
Infiniband SAN
Mellanox HDR High Performance Interconnect
Mgmt LoginGW1 …..GW2 GW3 GW4 Gateway
Nodes
AMDNaples
0-23
AMD Naples24-47
Remote ServerManagement
PCIe3 x16
1Gbe
InfiniBandHCA IPMIHigh performance
Network fabric
PCIe GPGPUPCIe3 x16PCIe GPGPUPCIe3 x16PCIe3 x8
PCIe GPGPUPCIe NVRAM
PCIe3 x16PCIe GPGPUPCIe3 x16
Node• AMD Naples 24-core 2.0 GHz• Memory: 256 GB; 5.3 GB/core• Memory BW: > 300 GB/s DDR• 1.6 TB NVMe• Mellanox HDR100• 4 GPU per compute node
System Nodes• 82 CPU-only nodes• 82 CPU+GPU• 4 Gateways• 1 Login• 1 Management
3LLNL-PRES-xxxxxx
Corona Highlights
. TF/s5. TF/s
10. TF/s15. TF/s20. TF/s25. TF/s30. TF/s35. TF/s40. TF/s
MI-25 MI-60 V100
GPU FP Performance
FP64 FP32 PF16
TF/s
5,000 TF/s
10,000 TF/s
15,000 TF/s
20,000 TF/s
MI-25 MI-60 Total
Corona FP Performance
FP64 PF32 PF16
Considering adding 328 AMD MI-60 GPUs to Corona
112
4LLNL-PRES-xxxxxx
Corona NVMe
NVMeHGST 2N2003 DWPD
Read Write
Sequential @ 128 KiB 3.35 GB/s 2.1 GB/s
Random @ 4 KiB 835K IOPs 200K IOPs
Total 549 TB/s; 137M IOPs 344 TB/s; 32.8M IOPs
5LLNL-PRES-xxxxxx
Corona Software Environment
• Tri-Lab Operating System Software HPC environment as base foundation• TOSS 3.x based on RHEL 7.x• Provides smooth transition for TOSS team and LLNL HPC users• Includes AMD drivers, compilers, etc.• Slurm + Flux scheduler and resource manager
• Additional software for Data science & Machine Learning• Containers supported• Working with early users to explore other software
Corona is onsite and undergoing burn-in. Early User access in April.
6LLNL-PRES-xxxxxx
Commodity Technology Systems
• Status of CTS-2 procurement• Approximate Timeline• Potential Architectures
7LLNL-PRES-xxxxxx
Market surveys
LANL SNL
Market surveys
Update Tech requirements
Release DRAFT RFP
Vendor Selection
Tri-lab negotiations
CTS-2 contract awarded
CTS-2 Market surveys
CTS-2 and TOSS teams continue to work together during CTS-2 deployment & lifetime support
Market surveys
LLNL
Feedback on DRAFT RFP
Final RFP
Jan. 2020
April 2019
August 2019
Sept 2019
Sept-Oct 2019
April 2019 - May 2019
Oct. 2018 – March 2019
Oct. 2018 – March 2019
2018-2019CTS-2 activities leadingto RFP and Contract
8LLNL-PRES-xxxxxx
DRAFTCTS-2 Procurement Timeline
2019
Market Survey Begins
CTS-2 contract awarded
Jan. Feb. March April May June July Aug. Sept. Oct. Nov. Dec.
2020
2021-2023
Jan. Feb. March April May June July Aug. Sept. Oct. Nov. Dec.
Jan. Feb. March April May June July Aug. Sept. Oct. Nov. Dec.
Oct. Nov. Dec.2018Release DRAFT
CTS-2 RFP
Release Final
CTS-2 RFP
CTS-2 Proposal Review & Vendor
Selection
Contract Negotiations
Complete
CTS-2 SU: Phase 0 Deliveries
Begin softwareIntegration with TOSS
TOSS Early Evaluation System
PotentialArchitecture
Decision Point
CTS-2 SU: Phase 1 Deliveries CTS-2 SU: Phase 2 Deliveries
Deliveries may start in 2H2020ASC Deployments may start in 1H2021
9LLNL-PRES-xxxxxx
Potential CTS-2 Node Design
CPU0-47
CPU48-95
IPC Link
Remote ServerManagement
32-64 GB DIMMs DDR532GB x 8 DIMMs = 256 GB/socket> 200 GB/s per socket
High-SpeedNetwork HCA
High performanceNetwork fabric
CPU Architecture & Software Readiness are key aspect of CTS-2 Selection• Intel Xeon, AMD Epyc, Marvell ThunderX, IBM Power all viable processors•Maturity of platform?•TOSS support•Maturity of system software and overall software ecosystem?•Cost/performance of platform?
What about GPU systems and HBM memory?
10LLNL-PRES-xxxxxx
Bringing ATS features to CTS-2
• GPU are becoming more widely adopted• Past commodity procurements were dominated by CPU-only SU’s• GPU system will be available under CTS-2• Programs responsible for determining the mix of CPU-only + GPU
nodes/clusters best address workloads• How much GPU memory do you need?• What is the ratio of CPU’s to GPU’s?• Is hardware support for unified memory required?• Can all codes utilize GPU’s?• Can all workloads utilize GPU’s – 3D vs 2D?
11LLNL-PRES-xxxxxx
Bringing ATS features to CTS-2
• Give me the fast GPU memory but on CPU’s!!!
• Today’s GPU utilize High Bandwidth Memory (HBM v2 or HBM2)
• CPU + HBM may be a nice architecture for CTS
• Time to market is likely 2022+
• High Bandwidth Memory provides
— ~3X more bandwidth per socket
— ~4X less memory capacity per socket
— 1-1.5 GB/core – adapt applications accordingly
• CTS-2 will include options for CPU+HBM if/when available
DisclaimerThis document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC.The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.
Top Related