Exascale Evolution 1 Brad Benton, IBM March 15, 2010.
-
Upload
lucy-holmes -
Category
Documents
-
view
212 -
download
0
Transcript of Exascale Evolution 1 Brad Benton, IBM March 15, 2010.
![Page 1: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/1.jpg)
Exascale Evolution
www.openfabrics.org 1
Brad Benton, IBMMarch 15, 2010
![Page 2: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/2.jpg)
Agenda
• Exascale Challenges
• On the Path to Exascale:A Look at Blue Waters
2www.openfabrics.org
![Page 3: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/3.jpg)
Exascale Challenges
3www.openfabrics.org
![Page 4: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/4.jpg)
Exascale Challenges
• Challenges at every level of system design– Managing 500M to 1B (most likely heterogeneous)
cores– Programming models to exploit multi-core +
accelerators– Interconnect
• How will IB/RC scale to exascale?• How do we “get off the bus”?• How can we put more capability in the interconnect
– Power Management• Power vs. Performance tradeoffs
4www.openfabrics.org
![Page 5: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/5.jpg)
Exascale Challenges
• Challenges at every level of system design– Resilience/Fault-Tolerance
• At this scale, something always be broken or in the process of breaking
– Development Environment/Performance Tuning– Workflow Management/Process Steering– Data Management/Storage/Visualization
5www.openfabrics.org
![Page 6: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/6.jpg)
Exascale Challenges
• Resiliency/Fault-Tolerance– F/T Model
• Fault Detection• Fault Isolation• Fault Containment• Fault Recovery• Re-integration
– Software Resiliency• More than just checkpoint/restart• Containers/virtualization• suspend/migrate/resume
6www.openfabrics.org
![Page 7: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/7.jpg)
Programming Models
• MPI– Will it survive in an exascale world? (its demise was predicted at
petascale, but seems to be doing okay)
• Evolve hybrid language models: MPI + “What?”– OpenMP– GPU Accelerators (CUDA, OpenCL)– PGAS languages
• Greater Exploitation of Autotuningi.e., programs that write progams– ATLAS– FFTW– IBM HPC Toolkit has some of this
7www.openfabrics.org
![Page 8: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/8.jpg)
Title goes here on one line.
On the Path to Exascale:
A look at Blue Waters
8www.openfabrics.org
![Page 9: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/9.jpg)
NCSA Blue Waters
• Joint effort between NCSA and University of Illinoishttp://www.ncsa.illinois.edu/BlueWaters/
• First Deliverable of a system based on PERCS technology (2011)
• Will be the world’s first sustained petascale system for open scientific research
• http://www.ncsa.illinois.edu/BlueWaters/pdfs/snir-power7.pdf for more detailed information
9www.openfabrics.org
![Page 10: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/10.jpg)
Blue Waters Overview
• Approximately 10 PF/s peak• More than 300,000 cores (homogeneous)• More than 1 PetaByte memory• More than 10 Petabyte disk storage• More than 0.5 Exabyte archival storage• More than 1 PF/s sustained on scientific
applications
1010www.openfabrics.org
![Page 11: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/11.jpg)
Building Blue Waters
Multi-chip Module4 Power7 chips128 GB memory512 GB/s memory bandwidth1 TF (peak)
Router1,128 GB/s bandwidth
IH Server Node8 MCM’s (256 cores)1 TB memory8 TF (peak)
Fully water cooled
Blue Waters Building Block32 IH server nodes32 TB memory256 TF (peak)4 Storage systems10 Tape drive connections
Blue Waters~1 PF sustained>300,000 cores
>1 PB of memory>10 PB of disk storage
~500 PB of archival storage>100 Gbps connectivity
Blue Waters is built from components that can also be used to build systems with a wide range of capabilities—from deskside to beyond Blue Waters.
Blue Waters will be the most powerful computer in the world for scientific research when it comes on linein Summer of 2011.
CI Days • 22 February 2010 • University of Kentucky
Power7 Chip8 cores, 32 threadsL1, L2, L3 cache (32 MB)Up to 256 GF (peak)45 nm technology
![Page 12: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/12.jpg)
Power7 Chip: Computational Heart of Blue Waters
• Base Technology– 45 nm, 576 mm2– 1.2 B transistors
• Chip– 8 cores– 12 execution units/core– 1, 2, 4 way SMT/core– Up to 4 FMAs/cycle– Caches
• 32 KB I, D-cache, 256 KB L2/core
• 32 MB L3 (private/shared)
– Dual DDR3 memory controllers• 128 GB/s peak memory bandwidth (1/2 byte/flop)
– Clock range of 3.5 – 4 GHz
Quad-chip MCM
Power7 Chip
12www.openfabrics.org
![Page 13: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/13.jpg)
High-End Server Resilience
13
![Page 14: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/14.jpg)
Feeds and Speeds per MCM
• 32 cores• 8 Flop/cycle per core• 4 threads per core max• 3.5 – 4 GHz• 1 TF/s• 32 MB L3• 512 GB/s memory BW (0.5 Byte/flop)• 800 W (0.8 W/flop)
14
![Page 15: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/15.jpg)
First Level InterconnectL-LocalHUB to HUB Copper Wiring256 Cores
DCA-0 Connector (Top DCA)DCA-1 Connector (Bottom DCA)
1st Level Local Interconnect (256 cores)
HUB7
HUB6
HUB4
HUB3
HUB5
HUB1
HUB0
HUB2
PCIe
9
PCIe
10
PCIe
11
PCIe
12
PCIe
13
PCIe
14
PCIe
15
PCIe
16
PCIe
17P1-C
17-C1
PCIe
1
PCIe
2
PCIe
3
PCIe
4
PCIe
5
PCIe
6
PCIe
7
PCIe
8
Opt
ical
Fan
-out
from
H
UB
Mod
ules
2,30
4 F
iber
'L-L
ink'
64/40 Optical'D-Link'
64/40 Optical'D-Link'
P7-0
P7-2
P7-3P7-1
QCM 0
U-P1-M1
P7-0
P7-2
P7-3P7-1
QCM 1
U-P1-M2
P7-0
P7-2
P7-3P7-1
QCM 2
U-P1-M3
P7-0
P7-2
P7-3P7-1
QCM 3
U-P1-M4
P7-0
P7-2
P7-3P7-1
QCM 4
U-P1-M5
P7-0
P7-2
P7-3P7-1
QCM 5
U-P1-M6
P7-0
P7-2
P7-3P7-1
QCM 6
U-P1-M7
P7-0
P7-2
P7-3P7-1
QCM 7
U-P1-M8
P1-C
16-C1
P1-C
15-C1
P1-C
14-C1
P1-C
13-C1
P1-C
12-C1
P1-C
11-C1
P1-C
10-C1
P1-C
9-C1
P1-C
8-C1
P1-C
7-C1
P1-C
6-C1
P1-C
5-C1
P1-C
4-C1
P1-C
3-C1
P1-C
2-C1
P1-C
1-C1
N0-DIMM15
N0-DIMM14
N0-DIMM13
N0-DIMM12
N0-DIMM11
N0-DIMM10
N0-DIMM09
N0-DIMM08
N0-DIMM07
N0-DIMM06
N0-DIMM05
N0-DIMM04
N0-DIMM03
N0-DIMM02
N0-DIMM01
N0-DIMM00
N1-DIMM15
N1-DIMM14
N1-DIMM13
N1-DIMM12
N1-DIMM11
N1-DIMM10
N1-DIMM09
N1-DIMM08
N1-DIMM07
N1-DIMM06
N1-DIMM05
N1-DIMM04
N1-DIMM03
N1-DIMM02
N1-DIMM01
N1-DIMM00
N2-DIMM15
N2-DIMM14
N2-DIMM13
N2-DIMM12
N2-DIMM11
N2-DIMM10
N2-DIMM09
N2-DIMM08
N2-DIMM07
N2-DIMM06
N2-DIMM05
N2-DIMM04
N2-DIMM03
N2-DIMM02
N2-DIMM01
N2-DIMM00
N3-DIMM15
N3-DIMM14
N3-DIMM13
N3-DIMM12
N3-DIMM11
N3-DIMM10
N3-DIMM09
N3-DIMM08
N3-DIMM07
N3-DIMM06
N3-DIMM05
N3-DIMM04
N3-DIMM03
N3-DIMM02
N3-DIMM01
N3-DIMM00
N4-DIMM15
N4-DIMM14
N4-DIMM13
N4-DIMM12
N4-DIMM11
N4-DIMM10
N4-DIMM09
N4-DIMM08
N4-DIMM07
N4-DIMM06
N4-DIMM05
N4-DIMM04
N4-DIMM03
N4-DIMM02
N4-DIMM01
N4-DIMM00
N5-DIMM15
N5-DIMM14
N5-DIMM13
N5-DIMM12
N5-DIMM11
N5-DIMM10
N5-DIMM09
N5-DIMM08
N5-DIMM07
N5-DIMM06
N5-DIMM05
N5-DIMM04
N5-DIMM03
N5-DIMM02
N5-DIMM01
N5-DIMM00
N6-DIMM15
N6-DIMM14
N6-DIMM13
N6-DIMM12
N6-DIMM11
N6-DIMM10
N6-DIMM09
N6-DIMM08
N6-DIMM07
N6-DIMM06
N6-DIMM05
N6-DIMM04
N6-DIMM03
N6-DIMM02
N6-DIMM01
N6-DIMM00
N7-DIMM15
N7-DIMM14
N7-DIMM13
N7-DIMM12
N7-DIMM11
N7-DIMM10
N7-DIMM09
N7-DIMM08
N7-DIMM07
N7-DIMM06
N7-DIMM05
N7-DIMM04
N7-DIMM03
N7-DIMM02
N7-DIMM01
N7-DIMM00
ONE DRAWER8 MCMs, 32 chips, 256 cores
www.openfabrics.org 15
![Page 16: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/16.jpg)
Interconnect: 1.1 TB/s HUB
• 192 GB/s Host Connection• 336 GB/s to 7 other local nodes in
the same drawer• 240 GB/s to local-remote nodes in
the same supernode (4 drawers)• 320 GB/s to remote nodes• 40 GB/s to general purpose I/O
www.openfabrics.org 16
![Page 17: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/17.jpg)
www.openfabrics.org 17
![Page 18: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/18.jpg)
Second Level InterconnectOptical ‘L-Remote’ Links from HUBConstruct Super Node (4 CECs)1,024 CoresSuper Node
L-Li
nk C
able
s
Super Node(32 Nodes / 4 CEC)
ONE SUPERNODE4 drawers, 32 MCMs, 128 chips, 1024 cores
www.openfabrics.org 18
![Page 19: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/19.jpg)
BPA200 to 480Vac 370 to 575VdcRedundant PowerDirect Site Power FeedPDU Elimination
WCUFacility Water Input100% Heat to WaterRedundant CoolingCRAH Eliminated
Storage Unit4U0-6 / RackUp To 384 SFF DASD / UnitFile System
CECs2U1-12 CECs/Rack256 Cores128 SN DIMM Slots / CEC8,16, (32) GB DIMMs17 PCI-e SlotsImbedded SwitchRedundant DCANW FabricUp to:3072 cores, 24.6TB (49.2TB)
Rack990.6w x 1828.8d x 2108.239”w x 72”d x 83”h~2948kg (~6500lbs)
Rack ComponentsComputeStorageSwitch100% CoolingPDU Eliminated
Input: 8 Water Lines, 4 Power Cords
Out: ~100TFLOPs / 24.6TB / 153.5TB 192 PCI-e 16x / 12 PCI-e 8x
19www.openfabrics.org
![Page 20: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/20.jpg)
How does this affect OFA?
• Blue Waters can connect externally via PCIe devices (e.g., InfiniBand) as needed
• Blue Waters interconnect– Is RDMA based– Is not InfiniBand (or iWARP or RoCEE)– Hardware support for Global Shared Memory
• Pendulum is swinging back to proprietary interconnects (at least at IBM)
• Is there a path to OFA compatibility?– how can/should OFA accept/support new/different RDMA
interconnects?– how can/should IBM work w/OFA for embracing new interconnect
technologies?
www.openfabrics.org 20
![Page 21: Exascale Evolution 1 Brad Benton, IBM March 15, 2010.](https://reader035.fdocuments.us/reader035/viewer/2022070412/56649dd05503460f94ac512c/html5/thumbnails/21.jpg)
Exascale Evolution
• Technical Evolution is not always in a straight line
• Different technologies evolve at different times and rates
• e.g., Blue Waters is not a direct descendent of RoadRunner/Cell, but rather of POWER/Federation/SP
• To reach exascale levels will require the consolidation and continued evolution of multiple technologies
www.openfabrics.org 21