Jutta Docter Institute for Advanced Simulation (IAS) Jülich … · 2020. 6. 7. · DeIC,...
Transcript of Jutta Docter Institute for Advanced Simulation (IAS) Jülich … · 2020. 6. 7. · DeIC,...
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
High Performance Computing at the
Jülich Supercomputing
Center Jutta Docter
Institute for Advanced Simulation (IAS) Jülich Supercomputing Centre (JSC)
DeIC, 1.10.2013, Jutta Docter, JSC 2
Jülich Supercomputing Center at Forschungszentrum Jülich JUQUEEN
Installation Administration Usage Applications
Fileserver PRACE (Partnership for Advanced Computing in Europe)
Overview
DeIC, 1.10.2013, Jutta Docter, JSC 3
Where is Jülich?
Jülich *
DeIC, 1.10.2013, Jutta Docter, JSC 4
Forschungszentrum Jülich (FZJ)
JSC
DeIC, 1.10.2013, Jutta Docter, JSC 5
FZJ at a Glance (2012)
Budget: 557 mio. € (thereof third-party funding: 173 mio. €)
Staff: 5,200 (thereof 1,650 scientists) Visiting scientists: 860 from 40 countries Trainees: 90 per year Publications: 2,200 per year Protective rights and licences: 16,892 Industry cooperations: 363 Research fields: health, energy and environment,
information technology; key technologies for tomorrow
DeIC, 1.10.2013, Jutta Docter, JSC 6
Jülich Supercomputing Centre (JSC)
DeIC, 1.10.2013, Jutta Docter, JSC 7
Supercomputer operation for: Centre – FZJ Regional – JARA National – NIC, GCS Europe – PRACE, EU projects
Application support Traditional and SimLab support model Scientific visualization Peer review support and coordination
R&D work Methods and algorithms, performance analysis and tools Community data management service Computer architectures, Exascale Laboratories with IBM, Intel, NVIDIA
Education and Training
Jülich Supercomputing Centre
DeIC, 1.10.2013, Jutta Docter, JSC 8
Gauss Centre for Supercomputing
German representative in PRACE Alliance of the three German national supercomputing centres
Jülich Supercomputing Centre (JSC) Leibniz-Rechenzentrum (LRZ), Munich Höchstleistungsrechenzentrum Stuttgart (HLRS)
Support of computational science through Multi-Petaflop/s supercomputers Multi-Petabyte storage Multi-Gigabit networking infrastructure Large-Scale projects
DeIC, 1.10.2013, Jutta Docter, JSC 9
User Support @ JSC - Overview
Cross-Sectional Teams
Simulation Laboratories
Application Optimization
Methods& Algorithms
Parallel Performance
Biology
Climate Science
Molecular Systems
Plasma Physics
Solid & Fluid Engeneering
DeIC, 1.10.2013, Jutta Docter, JSC 11
2004
2006-8
2009
2012 File Server
GPFS, Lustre
IBM Power 4+ JUMP, 9 TFlop/s
IBM Power 6 JUMP, 9 TFlop/s
IBM Blue Gene/P JUGENE, 1 PFlop/s
JSC Supercomputer Systems: Dual Track Approach
HPC-FF 100 TFlop/s
JUROPA 200 TFlop/s
General-Purpose Highly-Scalable
2014
JUROPA++ Cluster, 1-2 PFlop/s + Booster
IBM Blue Gene/Q JUQUEEN 5.9 PFlop/s
IBM Blue Gene/L JUBL, 45 TFlop/s
DeIC, 1.10.2013, Jutta Docter, JSC 12
Transition: JUGENE JUQUEEN
JUGENE
JUQUEEN
Jul 2012
Aug 2012
Sep 2012
Oct 2012
Jun 2012
May 2012
Apr 2012
Nov 2012
Dec 2012
Jan 2013
Jul 2012
Jun 2012
May 2012
Apr 2012
Mar 2012
Feb 2012
4 4 8
16
8
28
Production
Production
User Access
Hardware Installation
Production Production
24
Feb 2013
BG/P
DeIC, 1.10.2013, Jutta Docter, JSC 13
JUQUEEN, 8 Racks, April 2012
DeIC, 1.10.2013, Jutta Docter, JSC 14
JUQUEEN, 24 Racks, November 2012
Top 500, Nov. 2012: Pos. 5
DeIC, 1.10.2013, Jutta Docter, JSC 15
JUQUEEN, 28 Racks, January 2013
Top 500, Jun. 2013: Pos. 7
DeIC, 1.10.2013, Jutta Docter, JSC 16 1
JUQUEEN - Hardware - Cables • 58.9 t of computational hardware
• 3,584 Torus data cables (29.7 km)
• 496 10GE cables (18 km)
• 84 Ethernet cables (3.4 km)
DeIC, 1.10.2013, Jutta Docter, JSC 17
JUQUEEN – Hardware - Power • 4.4 km power cables, total weight 7.9 t
• 112 power connections (3 phases)
• 336 circuit breakers
• 255 m steel for 7 frames (rack: 2t)
• 200 m cable trays
DeIC, 1.10.2013, Jutta Docter, JSC 18 1
JUQUEEN - Hardware - Water • 280 m pipes (stainless steel), separate valves per row
• 2 pumps with a max. flow rate of 210 m³ each
• Special pump control system for redundancy
• 18°C supply temperature - demineralized water
• 27°C return temperature
• Supply rate: 28 gal/min per rack
DeIC, 1.10.2013, Jutta Docter, JSC 19
DeIC, 1.10.2013, Jutta Docter, JSC 24
Source: IBM
DeIC, 1.10.2013, Jutta Docter, JSC 25
JUQUEEN Configuration 28 Racks Blue Gene/Q (7 x 4 racks) Nodes: 28,672 (Cores: 458,752) Nodeboards: 896 Main memory: 448 TB (16 GB per node) Overall peak performance: 5.9 Petaflops Power consumption: 10-80 kW per rack (av. 63 kW)
Processor: 16 cores per node, IBM PowerPC® A2 (1.6 GHz, 64 bit) 16-way SMP processor, quad floating point unit Internal network: 5D Torus (A,B,C,D,E) - 40 GBps, 2.5 μsec latency
DeIC, 1.10.2013, Jutta Docter, JSC 26
Blue Gene/Q node board (with water cooling)
Source: IBM
DeIC, 1.10.2013, Jutta Docter, JSC 27
Blue Gene/Q Compute Card
Heat spreader
DeIC, 1.10.2013, Jutta Docter, JSC 28
JUQUEEN I/O Drawer
DeIC, 1.10.2013, Jutta Docter, JSC 29
JUQUEEN Environment 248 I/O Nodes (8 per drawer, on top of racks): 27 x 8 + 1 x 32 on IO rich rack ( 1 ION per nodeboard) connected to 2 CISCO Switches Nexus 7018
4 Front-End Nodes (Red Hat) for user login and data transfer: [email protected] 1 Service Node (Red Hat) for BG software and DB2 database 1 Backup Service Node - to be manually activated
IBM p7 740, 8 cores (3.55 GHz), 128 GB memory, local storage device DS5020 (16 TB)
DeIC, 1.10.2013, Jutta Docter, JSC 30
FrontEnd
DB2
Backup Service Node
Service Node
FrontEnd
RAID
JUQUEEN Environment
SSH
Fileserver JUST
JUQUEEN
Nexus Switches
FrontEnd
FrontEnd
runjob
BG Control-System
DeIC, 1.10.2013, Jutta Docter, JSC 32
JUQUEEN – Cooling System
R03
R02
R01
R00
R13
R12
R11
R10
R23
R22
R21
R20
R33
R32
R31
R30
R43
R42
R41
R40
R53
R52
R51
R50
R63
R62
R61
R60
Operation: 2,3 bar Warning: 1,5 bar No autom. shutdown
Operation: 5,5 bar Down at: 0,5 bar
P
P 3 m³/min.
Heat Exchanger
deionized water
DeIC, 1.10.2013, Jutta Docter, JSC 33
Blue Gene/Q - Coolant Monitor per Rack
DeIC, 1.10.2013, Jutta Docter, JSC 34
Cooling Issues Learning Curve!
Cooling problems result in complete outage! External water issues, far to hot (32°C) or to cold (7°C) Pressure loss during pump switch-over reconfiguration to run both pumps in parallel
Pressure loss during work in outer circle Loss of water pressure, warnings, air bulbs in system,
sensor wrong alignment? replacement of empty nodeboards? no, but prefill installation of airing tools to eliminate bulbs checking of pressure variations
DeIC, 1.10.2013, Jutta Docter, JSC 35
Node or nodeboard failure → nodeboard in error (midplane not available) → replace No refund for aborted user jobs (write checkpoints!)
Correctable errors → run diagnostics → replace(?) Cable failure → use integrated redundancy Preventative: run diagnostics as batch jobs
BG/Q TEAL generates Alerts + sends messages to sysadmin Icinga Server, JSC checking scripts: send messages to
sysadmin alarm people on duty (8:00-24:00, Mon-Sat)
Hardware Failures and Monitoring
DeIC, 1.10.2013, Jutta Docter, JSC 36
Hardware Replacements
Regular: about 57 nodes and 7 nodeboards per month (of 28672 Nodes - MTTF >1 year) overhead: diagnostic, report, order, replace, watch, doc., …
Some IO drawers and midplanes (delicate pins) Service card outage on master clock rack single point of failure
Additional preventive replacements: Manufacturing defects on “G compute nodes” Bulk Power Modules (load distribution)
DeIC, 1.10.2013, Jutta Docter, JSC 37
System Administration
JSC Blue Gene Administration Team (3.5 PM)
System administrator of the week
JSC application support team
IBM coverage:
Mon - Fri 8:00 – 17:00
On site HW + SW personnel
DeIC, 1.10.2013, Jutta Docter, JSC 38
(New) Software Issues
• Blue Gene Software Updates (+ efixes) • June 2012 V 1.1.1 • Jan. 2013 V 1.2.0 • Aug. 2013 V 1.2.1
• Mellanox optical driver firmware update (3x) • GPFS, LoadLeveler, Compiler, ….
• IO nodes aborted with “Out of memory” adjustment / optimization of parameters (pagepool, buffers, etc…)
DeIC, 1.10.2013, Jutta Docter, JSC 39
IO Performance - IO Nodes to GPFS
• IO optimization, GPFS (3.5.0.11) ongoing at FZJ (and Argonne Nat. Lab.)
• Read is worse than write - bad to shared files • Testing lower layers first network setup is o.k. • Optimizing GPFS performance parameters • Setup of special GPFS / network testing environment • Regular teleconferences with IBM need to bring experts together!
DeIC, 1.10.2013, Jutta Docter, JSC 40
JSC LinkTest: Blue Gene torus link bandwidth tester
All-to-all ping-pong test Bandwidth distribution
Intra-node communication Communication via link A, B, C, D Communication via link E
DeIC, 1.10.2013, Jutta Docter, JSC 41
JSC llview
DeIC, 1.10.2013, Jutta Docter, JSC 42
LoadLeveler Batch Scheduler - Job Classes
Class/Queue Max. nodes Wall Clock Limit Priority
m048 / m056 24576 / 28672 24 h on demand only
m032 16384 24 h on demand only
m016 8192 24 h
m008 4096 24 h m004 2048 24 h m002 1024 24 h m001 512 24h n008 256 12 h n004 128 12 h
n002 64 30 min
n001 32 30 min
serial 0 60 min juqueenX
DeIC, 1.10.2013, Jutta Docter, JSC 43
Blue Gene Navigator
n001, n002
n004, n008
(ongoing replacements)
DeIC, 1.10.2013, Jutta Docter, JSC 44
LoadLeveler - New version 5.1.0
First and biggest BG/Q with LoadLevler testing on site! Negotiator dies (and restarts automatically) Reservations bigger than one midplane are not honored Jobs run on reserved partitions Erroneously waiting for missing cables, restart helps Not scheduling jobs although resources are free
(not checking all possibilities in 5D Torus) stabilizing August 2013
DeIC, 1.10.2013, Jutta Docter, JSC 45
DeIC, 1.10.2013, Jutta Docter, JSC 48
JUQUEEN User Groups – Germany – May 2013
DeIC, 1.10.2013, Jutta Docter, JSC 49
Linktest Scalasca
(application profiles and traces) SIONlib
Tools are ported
Trace analysis by Scalasca on Blue Gene/Q
DeIC, 1.10.2013, Jutta Docter, JSC 50
SIONlib - Scalable I/O library for parallel access to task-local files
Supports writing and reading binary data to or from thousands of processors into a single or a small number of physical files
General structure of a SION file:
Starting positions of the blocks are aligned to the
filesystem blocksize All writing and reading is done asynchronously Allows parallel access using MPI, OpenMp, or their combination,
and sequential access for post-processing utilities
DeIC, 1.10.2013, Jutta Docter, JSC 51
High-Q Club Highest Scaling Codes on JUQUEEN
Goals: Promote the idea of exascale capability computing Showcase for codes that can utilise the entire 28-racks Invest in tuning and scaling the codes Show that they are capable of using all 458,752 cores Aiming at more than 1 million concurrent threads Use a variety of programming languages and parallelisation models, demonstrating indiv. approaches to reach that goal Important milestone in application development towards future HPC (exascale) systems
DeIC, 1.10.2013, Jutta Docter, JSC 52
High-Q Club – Members
dynQCD Lattice Quantum Chromodynamics with dynamical fermions, JSC Gysela A gyrokinetic code for modelling fusion core plasmas, CEA, France PEPC A particle tree code for solving the N-body problem for Coulomb, gravitational and hydrodynamic systems, JSC PMG+PFASST A space-time parallel multilevel solver, Univ. of Wuppertal / LBNL Terra-Neo A multigrid solver for geophysics applications from the Univ. of Erlangen. waLBerla A widely applicable Lattice Boltzmann solver from the Univ. of Erlangen.
DeIC, 1.10.2013, Jutta Docter, JSC 53
JUQUEEN Porting and Tuning Workshop
First workshop February 2013 Second workshop 3.- 5. February 2014
http://www.fz-juelich.de/ias/jsc/EN/Expertise/High-Q-Club/_node.html
DeIC, 1.10.2013, Jutta Docter, JSC 54
Disks, Tapes, Robots
DeIC, 1.10.2013, Jutta Docter, JSC 55
0,00
1000,00
2000,00
3000,00
4000,00
5000,00
6000,00
7000,00
8000,00
9000,00
10000,00
11000,00
12000,00
13000,00
2003
2004
2006
2007
2008
2009
2010
2011
2012
2013
Tera
byte
Userdata GPFS
Disk Tape
Data Growth at JSC – User Data GPFS
DeIC, 1.10.2013, Jutta Docter, JSC 57
Jülich Storage Server - JUST (~ 10 PB online)
JUDGE JUQUEEN JUROPA TSM Server
Management Server
7.1 PB
$WORK
3 x 350 TB
$HOME
2 x 700 TB 1 x 350 TB
$ARCH
JUVIS
$DATA
160 GB/sec
DeIC, 1.10.2013, Jutta Docter, JSC 60
Tape Storage – 16600 Tapes (44.5 PB)
JUQUEEN Juropa Judge
3000 TSM Clients (Campus)
IBM TSM Server / ACSLS Server
SAN
SAN
ORACLE SL8500
20 T10K B/C
Separate buildings
Data Migration
Data Backup
Data Archive
ORACLE SL8500
28 T10K A/B/C
●●●
●●●
Priv. Tape Network
Cartridge Capacity: max. 5 TB Transfer Rate: up to 240 MB/sec
DeIC, 1.10.2013, Jutta Docter, JSC 61
PRACE – Partnership for Advanced Computing in Europe
Consists of 25 European partner states, each represented by one institution
http://www.prace-ri.eu
DeIC, 1.10.2013, Jutta Docter, JSC 62
Goals
Prepares the creation of a persistent, sustainable pan-European HPC service
Prepares the establishment of Tier-0 supercomputing centres at different European sites
Defines and establishes a legal and organisational structure involving HPC centres, national funding agencies, and scientific user communities
Develops funding and usage models and establishes a peer review process
Provides training for European scientists and creates a permanent education programme
DeIC, 1.10.2013, Jutta Docter, JSC 63
Tier-0 Systems - today
“Curie”, Bull Bullx cluster (France) “Fermi”, IBM Blue Gene/Q (Italy) “Hermit”, Cray XE6 (Germany) “JUQUEEN”, IBM Blue Gene/Q (Germany) “MareNostrum”, IBM System X iDataplex (Spain) “SuperMUC”, IBM System X iDataplex (Germany)
DeIC, 1.10.2013, Jutta Docter, JSC 64
JUQUEEN – Granted PRACE Projects - May 2013
DeIC, 1.10.2013, Jutta Docter, JSC 66
PRACE Calls for Proposals for Project Access
Twice per year: • Call opens in February:
Access is provided starting September of the same year. • Call opens in September:
Access is provided starting March of the next year.
• The PRACE 8th Call for Project Access is now open until 15 October 2013 http://www.prace-ri.eu/Call-Announcements?lang=en
DeIC, 1.10.2013, Jutta Docter, JSC 67
QUESTIONS?