NoRD : Node-Router Decoupling for Effective Power-gating of On-Chip Routers
IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32...
Transcript of IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32...
![Page 1: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/1.jpg)
![Page 2: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/2.jpg)
•
•
•
2
![Page 3: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/3.jpg)
•
•
•
•
•
•
3
![Page 4: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/4.jpg)
•
•
•
•
•
•
4
![Page 5: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/5.jpg)
•
•
•
•
•
•
5
![Page 6: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/6.jpg)
6
•
•
•
•
Fujitsu K Computer SPARC64 VIIIfx CPUs 8-core 2.0 GHz
8 floating point ops per cycle
Custom Tofu Interconnect
Approx 800 racks total Water cooled
17,136 (nodes) x 4 (sockets) x 8 (cores) x 8 (FP/cycle) x 2.0 (GHz)
= 8.773632 PFlops (Rpeak)
New #1 – K Computer at RIKEN Advanced Institute for Computational Science - Japan
10.51
8.162 PFlops Rmax
93% Linpack Efficiency
3.2 times previous #1
13.8% of Jun’11 aggregate throughput
9.898 Megawatts
Computer Power Consumption
824.6 MF/w 11.280 PF Rpeak
830.2
12.66 Megawatts
22,032 (nodes)
14.2% of Nov’11
![Page 7: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/7.jpg)
•
Fortran C/C++
MPI OpenMP
CUDA OpenCL
OpenACC
HMPP
MIC
Scout
X10
CAF
CLIK
StarSs
G.Array
UPC
Chapel 7
![Page 8: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/8.jpg)
8
•
•
•
•
•
•
![Page 9: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/9.jpg)
–
–
–
9
![Page 10: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/10.jpg)
10
•
•
•
•
![Page 11: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/11.jpg)
11
•
•
•
•
•
•
![Page 12: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/12.jpg)
12
•
•
•
•
•
•
•
•
•
![Page 13: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/13.jpg)
•
•
•
•
•
o
o
•
•
13
![Page 14: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/14.jpg)
•
•
•
•
•
•
14
![Page 15: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/15.jpg)
•
•
•
15
![Page 16: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/16.jpg)
• °C
•
•
•
•
16
![Page 17: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/17.jpg)
17
![Page 18: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/18.jpg)
•
•
•
•
18
![Page 19: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/19.jpg)
•
•
•
•
•
•
19
![Page 20: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/20.jpg)
The Mont-Blanc Project
• To develop an European exascale approach
• Based on embedded power-efficient technology
•
Taken from Alex Ramirez’s presentation, BSC 20
![Page 21: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/21.jpg)
The Mont-Blanc Project
Integrated system design built from mobile / embedded components
• ARM multicore processors
• Nvidia Tegra / Denver, Calxeda, Marvell Armada, ST-Ericsson Nova A9600, …
• Mobile accelerators
• Mobile GPU (Nvidia GT 500M etc.)
• Embedded GPU (Nvidia Tegra, ARM Mali T604)
• Low power 10 GbE switches
Taken from Alex Ramirez’s presentation, BSC 21
![Page 22: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/22.jpg)
The Mont-Blanc Project
• Exploit massive number of low-power processors
• Sustain performance with lower bandwidth components (i.e
interconnect and Memory)
• Programmability
Taken from Alex Ramirez’s presentation, BSC 22
![Page 23: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/23.jpg)
1. Chip 16 cores
2. Module
Single Chip
4. Node Card
32 Compute Cards,
Optical Modules, Link
Chips, Torus
5a. Midplane
16 Node Cards
6. Rack
2 Midplanes
1, 2 or 4 I/O Drawers
7. System
96 racks @ 20PF/s
3. Compute Card
One single chip module,
16 GB DDR3 Memory
5b. I/O Drawer
8 I/O Cards w/16 GB
8 PCIe Gen2 slots
IBM Blue Gene/Q
Per Rack
Peak Performance 209 TF
Sustained (Linpack) ~170+ TF
Power ~100 kW
Power Efficiency ~2 GF/W
Scalability
23
![Page 24: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/24.jpg)
BG/Q
Processor 64-bit
PowerPC
Processor Frequency 1.6 GHz
Nodes/Rack x Cores 1024 x 16
Memory/Core 1 GB
Memory Bandwidth 43 GB/s
Cores/Rack 16384
Peak
Performance/Rack 209.7 TF
Average Power/Rack 65 kW
Availability 1H12
•
•
•
•
•
Blue Gene/Q Ultra Low Power, Dense Parallel System
24
![Page 25: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/25.jpg)
25
![Page 26: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/26.jpg)
•
•
•
•
•
•
26
![Page 27: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/27.jpg)
•
o
•
•
o
27
![Page 28: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/28.jpg)
••
••
••
•
•
•
•28
![Page 29: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/29.jpg)
•
•
•
•
•
•
•
•
•
•
29
![Page 30: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/30.jpg)
Application performance
difference in %
power consumption
difference in %
CP2K 8.5 16.3
SEISSOL 10.9 18.2
GADGET 7.2 18.7
LBDC 4.5 16.5
NAMD 6.0 19.7
WRF 4.5 13.6
Lesli3d 1.7 13.1
GemsFDTD 1.3 13.1
BQCD 0.0 13.6
WALBERLA 0.0 13.7
•
•30
![Page 31: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/31.jpg)
•
•
•
•
•
•
31
![Page 32: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/32.jpg)
•
•
•
•
•
– MDDTL ~ 7 years (simulated, MTTFdisk=600Khrs, Weibull, 100-PB usable)
32
![Page 33: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/33.jpg)
Software / De-clustered RAID
Failu
re
Read
Write
Failu
re
22 HDDs
Traditional RAID
Declustered RAID
33
![Page 34: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/34.jpg)
•
•
•
•
•
•
34
![Page 35: IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5a. Midplane 16 Node Cards 6.](https://reader034.fdocuments.us/reader034/viewer/2022052614/605831e43408ad5f6b5ab8de/html5/thumbnails/35.jpg)
Thank you
35