Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific...
-
Upload
randall-walsh -
Category
Documents
-
view
219 -
download
0
Transcript of Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific...
![Page 1: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/1.jpg)
Overview of the NewBlue Gene/L Computer
Dr. Richard D. LoftDeputy Director of R&D
Scientific Computing DivisionNational Center for Atmospheric Research
![Page 2: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/2.jpg)
Outline
• What is Blue Gene/L and why is it interesting?
• How did one end up at NCAR?
• What is the objective of the NCAR Blue Gene/L project?
• What is the status of it?
• How do I get an account on Blue Gene/L?
![Page 3: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/3.jpg)
Why Blue Gene/L is Interesting
•Features•Massive parallelism - fastest in world. (137 Tflops)•Achieves high packaging density. (2048 pes/rack)•Lower power per processor. (25 KW/rack) •Dedicated reduction network. (solver scalability)•Puts network interfaces on chip. (embedded tech.)•Conventional programming model:
•xlf90, xlcc compiler •MPI
![Page 4: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/4.jpg)
Fuel Efficiency: Gflops/Watt
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BlueGene/L DD2 beta-System (0.7 GHz PowerPC 440)
SGI Altix 1.5 GHz, Voltaire Infiniband
Earth-Simulator
eServer BladeCenter JS20+ (PowerPC970 2.2 GHz), Myrinet
Intel Itanium2 Tiger4 1.4GHz - QuadricsASCI Q - AlphaServer SC45, 1.25 GHz
1100 Dual 2.3 GHz Apple XServe/Mellanox Infiniband 4X/Cisco GigE
BlueGene/L DD1 Prototype (0.5GHz PowerPC 440 w/Custom)
eServer pSeries 655 (1.7 GHz Power4+)
PowerEdge 1750, P4 Xeon 3.06 GHz, Myrinet
eServer pSeries 690 (1.9 GHz Power4+)eServer pSeries 690 (1.9 GHz Power4+)LNX Cluster, Xeon 3.4 GHz, Myrinet
RIKEN Super Combined Cluster
BlueGene/L DD2 Prototype (0.7 GHz PowerPC 440)
Integrity rx2600 Itanium2 1.5 GHz, QuadricsDawning 4000A, Opteron 2.2 GHz, Myrinet
Opteron 2 GHz, Myrinet
MCR Linux Cluster Xeon 2.4 GHz - Quadrics
ASCI White, SP Power3 375 MHz
SP Power3 375 MHz 16 way
TeraGrid, Itanium2 1.3/1.5 GHZ, Myrinet
eServer Opteron 2.2 GHz. Myrinet
xSeries Cluster Xeon 2.4 GHz - Quadrics
eServer pSeries 655/690 (1.5/1.7 Ghz Power4+)
xSeries Xeon 3.06 GHz, Quadrics
eServer pSeries 690 (1.7 GHz Power4+)
AIST Super Cluster P-32, Opteron 2.0 GHz, Myrinet
Cray X1
eServer pSeries 690 (1.7 GHz Power4+)
Gflops/Watt
Top 20 systemsBased on processor power rating only
Blue Gene/LSystems
![Page 5: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/5.jpg)
BG/L Questions/Limitations
•Questions•High reliability? (1/N effect)•Applications for 100k processors? (Amdahl’s Law)•System robustness: I/O, scheduling flexibility.
•Limitations•Node Memory Limitation (512 MB/node)•Partitioning is quantized (power of two)•Simple node kernel - (no: forks-> threads -> OMP)•No support for multiple executables.
![Page 6: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/6.jpg)
BlueGene/L ASIC
PLB (4:1)
“Double FPU”
Ethernet Gbit
JTAGAccess
144 bit wide DDR256MB
JTAG
Gbit Ethernet
440 CPU
440 CPUI/O proc
L2
L2
MultiportedSharedSRAM Buffer
Torus
DDR Control with ECC
SharedL3 directoryfor EDRAM
Includes ECC
4MB EDRAM
L3 CacheorMemory
l
6 out and6 in, each at 1.4 Gbit/s link
256
256
1024+144 ECC256
128
128
32k/32k L1
32k/32k L1
2.7GB/s
22GB/s
11GB/s
“Double FPU”
5.5GB/s
5.5 GB/s
256
snoop
Tree
3 out and3 in, each at 2.8 Gbit/s link
GlobalInterrupt
4 global barriers orinterrupts
128
![Page 7: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/7.jpg)
The Blue Gene/L Architecture
![Page 8: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/8.jpg)
BlueGene/L Has Five Networks3-Dimensional Torus
– interconnects all compute nodes – 175 MB/sec/link bidirectional
Global Tree– point-to-point, one-to-all broadcast, reduction functionality– 1.5 microsecond latency ( @64K node )
Global Interrupts– AND/OR operations for global barriers – 1.5 microseconds latency (64K system)
Ethernet– incorporated into every node ASIC– active in the I/O nodes (1:64 in LLNL configuration)
• 1K 1Gbit links – all external comm. (file I/O, control, user interaction, etc.)
JTAG (Control)
![Page 9: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/9.jpg)
BlueGene/L System Software Architecture
• User applications execute exclusively in the compute nodes
– avoid asynchronous events (e.g., daemons, interrupts)
• The outside world interacts only with the I/O nodes, an offload engine
– standard solution: Linux
• Machine monitoring and control also offloaded to service nodes: large SP system or Linux cluster.
![Page 10: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/10.jpg)
Blue Gene/L system overview
![Page 11: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/11.jpg)
Blue Gene/L @ NCAR
![Page 12: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/12.jpg)
How did one get to NCAR?
• MRI proposal in partnership with CU’s
• Elements of MRI proposal to NSF: proving out an experimental architecture.– Application porting and scalability– System software testing
• Parallel file systems (Lustre, GPFS)• Schedulers (LSF, SLURM, COBALT)
– Education
![Page 13: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/13.jpg)
BlueGene/L Collaboration
NCAR
CU Denver
CU Boulder
Blue Gene/L
![Page 14: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/14.jpg)
BlueGene/L Collaborators
• NCAR– Richard Loft– Janice Coen– Stephen Thomas– Wojciech Grabowski
• CU Boulder– Henry Tufo– Xiao-Chuan Cai– Charbel Farhat– Thomas Manteuffel– Stephen McCormick
• CU Denver– Jan Mandel– Andrew Knyazev
Blue Gene/L
![Page 15: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/15.jpg)
Details of NCAR/CU Blue Gene/L
• 2048 processors, 5.73 Tflops peak• 4.61 Tflops on Linpack Benchmark• Unofficially, 33rd fastest system in the world (in
one rack!)• 6 Tbytes of high performance disk• Delivered to Mesa Lab: March 15th• Acceptance tests
– began March 23rd.– Completed March 28th.– First PI meeting March 30th.
![Page 16: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/16.jpg)
BG/L Front-End Architecture
![Page 17: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/17.jpg)
Bring-up of Frost BG/L System
• Criteria for readiness– Scheduler– Fine Grain Partitions– I/O subsystem ready– MSS connection
![Page 18: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/18.jpg)
Current “Frost” BG/L Status
• MSS connections in place.• I/O system issues appear to be behind us.• Partition definitions (512,256,128, 4x32) in place.• Codes ported: POP, WRF, HOMME, BOB, BGC5
(pointwise)• Biggest apps issue: memory footprint• Establishing relationships with other centers
– BG/L Consortium membership– Other BG/L sites: SDSC, Argonne, LLNL, Edinburgh
![Page 19: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/19.jpg)
“Frost” BG/L I/O performance
mean aggregate I/O rates on compute nodes
0
100
200
300
400
500
600
700
800
0 200 400 600 800 1000 1200
number of concurrent processes
throughput (MB/sec) write rate
read rate
-each process wrote or read 1 GB of data
-I/O request size was 1 MB
![Page 20: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/20.jpg)
Blue Gene/L “Frost” scheduler status
• IRC chat room scheduler - “hey, get off!” …done• LLML SLURM scheduler -testing
– has been installed, tested, available for 512 node “midplane” partitions only.
– LLNL testbed system will be used to port SLURM to smaller
partitions. • Argonne Cobalt scheduler - being installed
– DB2 Client on the FEN – Python– Elementtree (XML process library for Python)– Xerces (XML parser)– Supporting libraries (Openssl)
• Platform LSF - development account provided.
![Page 21: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/21.jpg)
MRI Investigator Phase
• MRI Investigator access only – Users related to MRI proposal– Porting/testing evaluation
• Applications– HOMME atmospheric GCM dycore (Thomas)– Wildfire modeling (Coen)– Scalable solvers - algebraic multigrid (Manteuffel,
McCormick)– Numerical Flight Test Simulation (Farhat)– WRF - high resolution (Hacker)
![Page 22: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/22.jpg)
User Access to Frost
• Cycles split – 50% UCAR – 40% CU Boulder– 10% CU Denver
• Interested users (access policy TBD)– UCAR: contact [email protected]– CU: contact [email protected]
![Page 23: Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.](https://reader036.fdocuments.us/reader036/viewer/2022062309/56649ea25503460f94ba582e/html5/thumbnails/23.jpg)
Questions?