Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC...
Transcript of Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC...
![Page 1: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/1.jpg)
Beyond the Desktop
The role of computational architectures in accelerating discovery
Mohammed Khaleel, Ph.D.
![Page 2: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/2.jpg)
Outline
� High-performance computing systems� Beyond the Desktop
Traditional (or “mainstream”) supercomputersScience applications
Multithreaded supercomputersCybersecurity
applications
Energy EfficiencyBack to the Desktop
![Page 3: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/3.jpg)
High-Performance Computing Systems
Nowadays, HPC systems are parallel computing systemsConsisting of hundreds of processors (or more)Connected by high bandwidth, low-latency networks
Collections of PCs connected by Ethernet are not HPC systems
Basic building block is a node: server-like computer (a few processor sockets, memory and network interconnect cards, possibly I/O devices).
Nodes are parallel computers on their own: contain usually >= 2 processor sockets with multiple cores per processor
Looks very similar to what you have on your desktop PC!!HPC systems have a multiplicity of applications in scientific and engineering areas: physics, chemistry, biology, material design, mechanical design.
![Page 4: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/4.jpg)
HPC Systems (cont.)Two basic kinds of HPC systems:
Distributed memory systemsShared memory systems
Distributed memory HPC systems:Typical HPC system, processors only have direct access to local memory on the node.Remote memory on other nodes must be accessed indirectly via a library call.Can scale to tens and hundreds of thousands of processors (Blue Gene/P @ LLNL, Chinook @ EMSL/PNNL)
Shared memory HPC systems:Processors have direct access to local memory on the node and to remote memory on other nodes.Speed of access may varyMore difficult to scale beyond a few thousand processors (Columbia SGI Altix
@ NASA)
![Page 5: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/5.jpg)
Outline
High-performance computing systemsBeyond the Desktop
� Traditional (or “mainstream”) supercomputersScience applications
Multithreaded supercomputersCybersecurity
applications
Energy EfficiencyBack to the desktop
![Page 6: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/6.jpg)
Feature DetailInterconnect DDR InfiniBand
(Voltaire, Mellanox)Node Dual Quad-core AMD Opteron
16 GB memoryLocal Scratch 400 MB/s, 924GB/s aggregate
440 GB per node. 1 PB aggregateGlobal Scratch 30 GB/s
250 TB totalUser /home 1 GB/s
20 TB total
Chinook (supercomputer at EMSL/PNNL)
6
2310 node HP clusterDual quad-core processors per nodeTotal: 18,480 cores
![Page 7: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/7.jpg)
Chinook cluster architecture
7
Chinook InfiniBand Core288 port
IB Switch
288 port IB
Switch
288 port IB
Switch
288 port IB
Switch
Computational Unit 1
288 port IB
Switch
Gig
E
192 nodes11 Racks
/homeNFS
PolyServe
20 TB1GB/s
/dtempSFS (Lustre)
250 TB30 GB/s
Chinook Ethernet
Core
Computational Unit 2
288 port IB
Switch
Gig
E
Computational Unit 3
288 port IB
Switch
Gig
E
Phase-1600 nodes
Computational Unit 4
288 port IB
Switch
Gig
E
Computational Unit 5
288 port IB
Switch
Gig
E
Computational Unit 6
288 port IB
Switch
Gig
EComputational
Unit 7
288 port IB
Switch
Gig
E
Computational Unit 8
288 port IB
Switch
Gig
E
Computational Unit 9
288 port IB
Switch
Gig
E
Computational Unit 10
288 port IB
Switch
Gig
E
Computational Unit 11
288 port IB
Switch
Gig
E
Computational Unit 12
288 port IB
Switch
Gig
E
Phase-22310 nodes
Login &
Admin
40 Gbit
Central Storage
PNNLNetwork
![Page 8: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/8.jpg)
Chinook software scalabilityScalaBLAST
scalability plot
8
work factor
0
1
2
3
4
5
6
7
8
1 10 100 1000 10000
ncores
quer
ies/
min
/cor
e/m
il_db
work factorideal
![Page 9: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/9.jpg)
NWChem on Chinook (log-log plots)
9
Si75 O148 H66 with DFT
3554 functions 2300 electrons
(H2 O)9 with MP2
828 functions 90 electrons
C6 H14 with CCSD(T)
264 functions 50 electrons
![Page 10: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/10.jpg)
Outline
High-performance computing systemsBeyond the Desktop
Traditional (or “mainstream”) supercomputersScience applications
� Multithreaded supercomputersCybersecurity
applications
Energy EfficiencyBack to the desktop
![Page 11: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/11.jpg)
Processor Architecture (cont.)
W. Wulf
S. McKee, “Hitting the memory wall: Implications of the obvious”, ACM Computer Architecture News, 1995
L1L2L3
memory
1-2c
10c
50c
500c
Processor speed
Memory speed
Memory Wall Problem
![Page 12: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/12.jpg)
Multithreaded ProcessorsCommodity memory is slow, custom memory is very expensive:
What can be done about it?Idea: cover latency of memory loads with other (useful) computation
OK, how do we do this?Use multiple execution contexts on the same processor, switch between them when issuing load operations
Execution contexts correspond to threadsExamples: Cray ThreadStorm
processors, Sun
Niagara 1 & 2 processors, Intel Hyperthreading
![Page 13: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/13.jpg)
Multithreaded Processors (cont.)
Execution UnitsExecution Units
T0T0
T1T1
T2 T2
T3T3 T4T4
T5T5
Each thread has its own independent instruction stream (program counter)
Each thread has its own independent register set
![Page 14: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/14.jpg)
Cray XMT multithreaded systemThreadStorm
processors run at 500 MHz128 hardware thread contexts, each with its own set of 32 registersNo data cache128KB, 4-way associative data buffer on the memory sideExtra bits in each 64-bit memory word: full/empty for synchronizationHashed memory at a 64-byte level, i.e. contiguous logical addresses at a 64-byte boundary are mapped to uncontiguous
physical locationsGlobal shared memoryScalable to 8,192 processors
![Page 15: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/15.jpg)
Cray XMT multithreaded system (cont.)
15
4 DIMM Slots4 DIMM Slots
CRAYSeastar2™
CRAYSeastar2™
CRAYSeastar2™
CRAYSeastar2™
L0 RAS ComputerL0 RAS ComputerRedundant VRMsRedundant VRMs
![Page 16: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/16.jpg)
1616
High-Performance String Matching on the Cray XMT
Fast, scalable string matching is at the base of modern cybersecurity
applications
Deep packet inspection for malwarePerformance has to be consistent and content independent
At the same system should be flexible and programmablePrevent content-based attacks
Excellent scalability and performance on the XMT
![Page 17: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/17.jpg)
Outline
High-performance computing systemsBeyond the Desktop
Traditional (or “mainstream”) supercomputersScience applications
Multithreaded supercomputersCybersecurity
applications
� Energy EfficiencyBack to the desktop
![Page 18: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/18.jpg)
EPA reports energy used in U.S. for servers and data centers is significant.
~ 61 billion kilowatt-hours (kWh) in 2006 1.5% of total electricity consumptionTotal electricity cost of about $4.5 billion. Similar to the amount of electricity consumed by approximately 5.8 million average U.S. households (or about five percent of the total housing stock). Federal servers and data centers alone
~ 6 billion kWh10% of electricity used for servers and data centers Total electricity cost of about $450 million annually.
EPA Report to Congress on Server and Data Center Energy Efficiency Released On August 2, 2007 and in response to Public Law 109-431
![Page 19: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/19.jpg)
Current Power Usage by Chinook, MSCF System at PNNL
Chinook (160TF peak), has 2310 dual socket quad-core AMD Opteron
(2.2GHz) based servers from HP each with 16 GB memory, 365 GB local disk, a DDR Infiniband
interconnect, and 297 TB global diskConsumes nearly 1.9 MW
~ 1/3 for cooling ~ 2/3 compute power (1.25 MW)
40% of compute power is lost to power delivery (rectifier, UPS, Feed, PDU, power supply, voltage regulator)
Average power efficiency for HPLno losses: 133MFlop/s/Wwith power delivery losses: 80MFlop/s/Wwith power-
and cooling delivery losses: 52MFlop/s/W
cooli
ngCo
mpu
tePo
wer
40% of compute power lost in power delivery
Top500 measures here
![Page 20: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/20.jpg)
Regional Weather Forecasting (WRF)
Multiple concurrent basic 4.5 days weather forecasts for North&Central
AmericaInitialization: 1° Global Forecast System analysis from National Weather ServiceDecomposition: 480x480 cartesian grid (15km) with 45 levelsSolver: Horizontal: Explicit High-Order Runge-Kutta; Vertical: ImplicitOutput: asynchronous 2.3GB netCDF every 3 model-hours per forecast
![Page 21: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/21.jpg)
QM Computational Chemistry (CP2K)
Multiple concurrent liquid-vapor interface model simulationsInitialization: Standard slab geometry (15x15x71Å3)Decomposition:; 215 H2O with single hydroxide ion Solver: Density Functional Theory with dual basis set (Gaussian & Plane-Wave)
in conjunction with molecular dynamics and umbrella samplingOutput: synchronous 75MB per 20k 0.5fs model-steps (MD time step)
![Page 22: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/22.jpg)
Device Under Test: NW-Ice
192 servers, 2.3 GHz Intel (quad-core) Clovertown, 16 GB DDR2 FBDIMM memory,160 GB SATA local scratch, DDR2 Infiniband
NIC
Five racks with evaporative cooling at processorsTwo racks air cooledLustre
Global File System
34TB mounted49TB provisioned
![Page 23: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/23.jpg)
Contributors to Power Consumption: Power Distribution
Data Center:Power Distribution UnitsPower Supply UnitsVoltage Regulators
Facility:TransformersRectifiersUPSInverters
![Page 24: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/24.jpg)
Contributors to Power Consumption: Cooling Chain
Data Center:Air HandlersClosely Coupled Cooling SystemsHVAC
Machine Plant:PumpsChillersCooling TowersEconomizers
![Page 25: Beyond the Desktop - PNNL€¦ · Looks very similar to what you have on your desktop PC!! HPC systems have a multiplicity of applications in ... DDR InfiniBand (Voltaire, Mellanox)](https://reader034.fdocuments.us/reader034/viewer/2022050509/5f9a1ecce89f7a6d74367a58/html5/thumbnails/25.jpg)
Back to the Desktop…
Historically, most technologies that have appeared in high-end supercomputers have eventually migrated to the desktop
Hardware units for numerical computationSuperscalar executionParallel processing (we’re observing it right now)
In the future, it is expected that most of the technologies I presented today will eventually migrate back to desktop machines
High-end interconnects between cores & processorsMultithreading capabilities
Commercial data centers are already looking for ways to improve their energy management
25