HPC - Science and Technology Facilities Council Technical Consultant Measuring and Optimising Energy...

17
1 © Bull, 2012 MEW23 Liverpool 27th28th November 2012 Dr. Dan Kidger HPC Technical Consultant Measuring and Optimising Energy Consumption of Batch jobs under BULL SLURM

Transcript of HPC - Science and Technology Facilities Council Technical Consultant Measuring and Optimising Energy...

1© Bull, 2012

MEW23 Liverpool 27th‐28th November 2012

Dr. Dan Kidger

HPC Technical Consultant

Measuring and Optimising Energy Consumption of Batch jobs under BULL SLURM

2© Bull, 2012

Industrial Electricity Prices in Europe

Source Eurostat – year 2010

Electricity Industrial Prices in UE

0,00

0,02

0,04

0,06

0,08

0,10

0,12

1998

2000

2002

2004

2006

2008

2010

€/kW

h

AllemagneEspagneFranceItaliePays-BasPologneRoyaume-UniNorvège

http://epp.eurostat.ec.europa.eu/portal/page/portal/energy/data/main_tables#

Electricity priceshighly variableacross EuropeAvg 0.11€/kWh

Electricity pricesRising steadily

CAGR 12%

3© Bull, 2012

Bull: European leader in mission‐critical digital systems

9,000EXPERTSrecognized worldwidein secure systems

OPERATING IN 50 COUNTRIES

€1.3bn REVENUES

+29%growth in profitabilityin 1st quarter 2012

+4,6%growth in 2011

+23%Efforts in research in 2011

4© Bull, 2012

TERA 100 in figures 1.25 PetaFlops

140 000+ Xeon cores

256 TB memory

30 PB disk storage

500 GB/s  IO throughput 580 m² footprint

CURIE in figures 2 PetaFlops

90 000+ Xeon cores148 000 GPU cores

360 TBmemory

10 PB disk storage

250 GB/s  IO throughput 200 m² footprint

IFERC in figures 1.5 PetaFlops

70 000+ Xeon cores

280 TBmemory

15 PB disk storage

120 GB/s  IO throughput 200 m² footprint

Bull in the Top500

18 systems in the Nov’12 Top500 list : 3 systems above 1 Pflops

5© Bull, 2012

Atomic Weapons Establishment

AWE confirms its trust in Bull with the upgrade 

of its 3 bullx  supercomputers

New blades in the existing infrastructure

Simple replacement of the initial blades with new bullx B510 blades featuring the latest Sandy Bridge EP CPUs

Willow 2x 35 TflopsWhitebeam 2x 156 TflopsBlackthorn 145 Tflops Sycamore 398 TflopsAll existing bullx chassis re‐used to house the new bladesUpgrade of the storage systemsCluster software upgraded to bullx supercomputer suite 4

3 systems in the top500: Blackthorn, WillowA and WillowB

6© Bull, 2012

This innovative engineering company specializing in design for the motor racing industry wanted to:

Support the use of advanced virtual engineering technologies, developed in-house, for complete simulated vehicle design, development and testing

Solution198 bullx B500 compute blades2 memory rich bullx S6010 compute nodes for pre and post meshing

7© Bull, 2012

bullx supercomputer suite: key values

• Super‐Fast image based provisioning• Web‐based Multi‐level supervision• Power management• Automated health management• Maintenance management

• Super‐Fast image based provisioning• Web‐based Multi‐level supervision• Power management• Automated health management• Maintenance management

bullx MCbullx MC

• Highly available cells based architecture• Increased throughput and scalability• Highly available cells based architecture• Increased throughput and scalabilitybullx  PFSbullx  PFS

• Advanced placement policies• Topology aware resource allocation• Advanced placement policies• Topology aware resource allocationbullx BM bullx BM 

• Multi‐path network failover• Abnormal patterns detection• Topology aware operations

• Multi‐path network failover• Abnormal patterns detection• Topology aware operations

bullx MPIbullx MPI

• Complete best of breed set of tools (from compiling, debugging to profiling and optimizing activities)

• Complete best of breed set of tools (from compiling, debugging to profiling and optimizing activities)bullx DEbullx DE

• HPC Enabled (OS jitter reduction, Optimized operations for increased application performance)

• Enhanced OFED 

• HPC Enabled (OS jitter reduction, Optimized operations for increased application performance)

• Enhanced OFED bullx Linuxbullx Linux

Ksis

Lustre

Slurm

OpenMPI

8© Bull, 2012

About Slurm

Originally intended as simple resource manager, but has evolved into sophisticated batch schedulerSimple and small enough for use by Intel for their 48‐core “cluster on a chip”Able to satisfy scheduling requirements for major computer centers with use of optional pluginsNo single point of failure, backup daemons, fault‐tolerant job optionsHighly scalable (1.6M core Bluegene/Q installation at LLNL)Highly portable (autoconf, extensive plugins for various environments)Open source (GPL v2)Operating on many of the world's largest computersAbout 500,000 lines of code today (plus test suite and documentation)

9© Bull, 2012

Power Management with Slurm

Existing energy saving mechanism in SLURMSystem side featureFramework for energy saving through unutilized nodes– Administrator configurable actions (hibernate,frequency scaling, power off,etc)– Automatic “Wake up” when jobs arrive

What can we do ?Make energy saving a User concern:

Monitor and report node and jobs energy consumptionControl over the jobs energy usage  

10© Bull, 2012

Task 1: Measuring Energy Consumption

Framework to support the capturing of power/energy consumption from the computing nodes

Captures and reports the per node power/energy consumptionCalculates the per step (job) energy consumption and stores on the DB along with the other execution characteristics

11© Bull, 2012

RAPL – power/energy measurments

• RAPL =“Running Average Power Limit”•Available in Intel SandyBridge onwards•Hardware registers for cumulative energy consumption

•PP0_ENERGY:•energy used by “power plane 0” which includes all cores and caches of a socket

•PP1_ENERGY:•energy used by the ”uncores” (this may include on-chip Intel GPU)

•PACKAGE_ENERGY:•total energy consumed by entire package (PP0 + PP1)

•DRAM_ENERGY:•energy drawn by the memory controller inside the processor chip (the actual power fed into the main memory DIMMs is not included in the current measurment)

12© Bull, 2012

SLURM configuration

Easy configuration through configuration file

Power mesures are reported through scontrol

scontrol show nodeNodeName=berlin47 Arch=x86_64 CoresPerSocket=8

CPUAlloc=0 CPUErr=0 CPUTot=32 Features=(null)Gres=(null)NodeAddr=berlin47 NodeHostName=berlin47OS=Linux RealMemory=1 Sockets=2 Boards=1State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1BootTime=2012-09-28T11:04:32 SlurmdStartTime=2012-10-08T10:59:45CurrentWatts=33 LowestJoules=106789 ConsumedJoules=1652356

#slurm.conf...AcctGatherEnergyType=acct_gather_energy/raplAcctGatherNodeFreq=30

13© Bull, 2012

Task 2: Controlling Energy Consumption

Measuring Energy Consumption and reporting it per step/job level is an important step, but users should have means to influence it as well. So we introduced –cpu‐freq parameter in srunThe user may ask either a particular value in kilohertz or use low/medium/high and the request will match the closest possible numerical value

14© Bull, 2012

Static Frequency Scaling with SLURM jobs

$# srun --cpu-freq=2700000 --resv-ports -N2 -n64 ./cg.C.64&

$#sacct -j 58 -format=jobid,elapsed,aveCPUFreq,consumedenergyJobID Elapsed AveCPUFreq ConsumedEnergy

------------ ---------- ---------- --------------66 00:00:49 2640340 19668

Effective CPU FrequencyJob Power consumption

15© Bull, 2012

Case Study : Conjugant Gradient Solver

AverageCPU Elapsed Time Consumed Energy(J)Frequency

1200000 00:01:35 193661396460 00:01:23 190181780477 00:01:09 193531996186 00:01:05 198172200000 00:01:02 204942362500 00:00:59 21408

2653125 00:00:56 23125

0

0.2

0.4

0.6

0.8

1

1200000

1396460

1780477

1996186

2200000

2362500

2653125

Ratio Time / Energy

$#srun --cpu-freq=2700000 --resv-ports –N2 -n64 ./cg.C.64

0

20

40

60

80

100

1.00 1.50 2.00 2.50 3.00

wallclock (s)

wallclock (s)

15000

16000

17000

18000

19000

20000

21000

22000

23000

24000

1.00 1.50 2.00 2.50 3.00

Joules

16© Bull, 2012

17© Bull, 2012

SLURM Project Team

Research and Development:‐Dan Rusak(Bull, USA)‐Don Albert(Bull, USA)      ‐Martin Perry (Bull, USA)‐Yiannis Georgiou (Bull, France)‐Xavier Bru (Bull, France)

Design and Integration:‐Nancy Kritkausky (Bull, France)‐Moe Jette (SchedMD, USA)‐Danny Auble (SchedMD, USA)

Research and Design Ideas:‐Matthieu Hautreux (CEA, France)