A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz...

32
A 1.7 Petaflops Warm-Water-Cooled System: Operational Experiences and Scientific Results Łukasz Flis , Karol Krawentek, Marek Magryś

Transcript of A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz...

Page 1: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

A 1.7 Petaflops Warm-Water-Cooled System: Operational Experiences and Scientific Results

Łukasz Flis , Karol Krawentek, Marek Magryś

Page 2: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.
Page 3: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

ACC Cyfronet AGH-UST

• established in 1973• part of AGH University of Science and Technology in

Krakow, Poland• provides free computing resources for scientific

institutions• centre of competence in HPC and Grid Computing• IT service management expertise (ITIL, ISO 20k)• member of PIONIER consortium• operator of Krakow MAN• home for supercomputers

Page 4: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

International projects

Page 5: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

PL-Grid infrastructure• Polish national IT infrastructure supporting e-Science

– based upon resources of most powerful academic resource centres– compatible and interoperable with European Grid– offering grid and cloud computing paradigms– coordinated by Cyfronet

• Benefits for users– unified infrastructure from 5 separate compute centres– unified access to software, compute and storage resources– non-trivial quality of service

• Challenges– unified monitoring, accounting, security– create environment of cooperation rather than competition

• Federation – the key to success

Page 6: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Competence Centre in the Field of Distributed Computing Grid Infrastructures

• Duration: 01.01.2014 – 31.11.2015• Project Coordinator: Academic Computer Centre CYFRONET AGH

The main objective of the project is to support the development of ACC Cyfronet AGH as a specialized competence centre in the field of distributed computing infrastructures, with particular emphasis on grid technologies, cloud computing and infrastructures supporting computations on big data.

PLGrid Core project

Page 7: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

ZEUS

Page 8: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

374 TFLOPS#211, #1 in Poland

Page 9: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.
Page 10: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Zeus usage

44.84%

41.45%

7.87%

chemistryphysicsmedicinetechnicalastronomybiologycomputer scienceelectronics, telecomunicationmetalurgymathematicsother

Page 11: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Why upgrade?

• Job size growth• Users hate waiting for resources• New projects, new requirements• Follow the advances in HPC• Power costs

Page 12: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

New building

Page 13: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Requirements for the new system

• Petascale system• Low TCO• Energy efficiency• Density• Expandability• Good MTBF• Hardware:

– core count– memory size– network topology– storage

Page 14: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Requirements: Liquid Cooling

• Water: up to 1000x more efficient heat exchange than air

• Less energy needed to move the coolant• Hardware (CPUs, DIMMs) can handle ~80C• Challenge: cool 100% of HW with liquid

– network switches– PSUs

Page 15: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Requirements: MTBF

• The less movement the better– less pumps– less fans– less HDDs

• Example– pump MTBF: 50 000 hrs– fan MTBF: 50 000 hrs– 1800 node system MTBF: 7 hrs

Page 16: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Requirements: Compute

• Max jobsize ~10k cores• Fastest CPUs, but compatible with old codes

– Two socket nodes– No accelerators at this point

• Newest memory– At least 4 GB/core

• Fast interconnect– Infiniband FDR– No need for full CBB fat tree

Page 17: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Requirements: Topology

services nodes

Service isle

storage nodes 576 nodes

Compute isle

Core IB switches

576 nodes

Compute isle

576 nodes

Compute isle

576 nodes

Compute isle

Page 18: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.
Page 19: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.
Page 20: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Why Apollo 8000?• Most energy efficient• The only solution with 100% warm water

cooling• Highest density• Lowest TCO

Page 21: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Even more Apollo

• Focuses also on ‘1’ in PUE!– Power distribution– Less fans– Detailed monitoring

• ‘energy to solution’

• Dry node maintenance• Less cables• Prefabricated piping• Simplified management

Page 22: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Prometheus

• HP Apollo 8000• 13 m2, 15 racks (3 CDU, 12 compute)• 1.65 PFLOPS• PUE <1.05, 680 kW peak power• 1728 nodes, Intel Haswell E5-2680v3• 41472 cores, 13824 per island• 216 TB DDR4 RAM• System prepared for expansion• CentOS 7

Page 23: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Prometheus storage

• Diskless compute nodes• Separate tender for storage

– Lustre-based– 2 file systems:

• Scratch: 120 GB/s, 5 PB usable space• Archive: 60 GB/s, 5 PB usable space

– HSM-ready• NFS for home directories and software

Page 24: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Deployment timeline

• Day 0 - Contract signed (20.10.2014)• Day 23 - Installation of the primary loop starts• Day 35 - First delivery (service island)• Day 56 - Apollo piping arrives• Day 98 - 1st and 2nd island delivered• Day 101 - 3rd island delivered• Day 111 - basic acceptance ends

• Official launch event on 27.04.2015

Page 25: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Facility preparation

• Primary loop installation took 5 weeks• Secondary (prefabricated) just 1 week• Upgrade of the raised floor done „just in case”• Additional pipes for leakage/condensation drain• Water dam with emergency drain• Lot of space needed for the hardware deliveries

(over 100 pallets)

Page 26: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Secondary loop

Page 27: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Challenges

• Power infrastructure being build in parallel• Boot over Infiniband

– UEFI, high frequency port flapping– OpenSM overloaded with port events

• BIOS settings being lost occasionally• Node location in APM is tricky• 5 dead IB cables (2‰)• 8 broken nodes (4‰)• 24h work during weekend

Page 28: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Solutions

• Boot to RAM over IB, image distribution with HTTP– Whole machine boots up in 10 min with just 1 boot server

• Hostname/IP generator based on MAC collector– Data automatically collected from APM and iLO

• Graphical monitoring of power, temperature and network traffic– SNMP data source,– GUI allows easy problem location– Now synced with SLURM

• Spectacular iLO LED blinking system developed for the offical launch

• 24h work during weekend

Page 29: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.
Page 30: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

System expansion

• Prometheus expansion already ordered• 4th island

– 432 regular nodes (2 CPUs, 128 GB RAM)– 72 nodes with GPGPUs (2x Nvidia Tesla K40XL)

• Installation to begin in September• 2.4 PFLOPS total performance (Rpeak)• 2232 nodes, 53568 CPU cores, 279 TB RAM

Page 31: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Future plans

• Push the system to it’s limits• Further improvements of the monitoring tools• Continue to move users from the previous system• Detailed energy and temperature monitoring• Energy-aware scheduling• Survive the summer and measure performance• Collect the annual energy and PUE• HP-CAST 25 presentation?

Page 32: A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.