Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS...

17
1 © Bull, 2013 Jean-Pierre Panziera Chief Technology Director CAS 2K13 – Sept. 2013

Transcript of Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS...

Page 1: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

1 © Bull, 2013

Jean-Pierre Panziera

Chief Technology Director CAS 2K13 – Sept. 2013

Page 2: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

2 © Bull, 2013

personal note

Page 3: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

3 © Bull, 2013

Complete solutions for Extreme Computing

HPC for every user

with public/private HPC cloud

Applications optimized

for hyper-parallel

supercomputers

Production ready

supercomputers

Data Centers for all

organizations

b u llx s u p e r c o m p u t e r s u it ebullx supercomputer suite

Page 4: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

4 © Bull, 2013

Infrastructure

Bull: from Supercomputers to Cloud Computing

Expertise

& services

Servers

Software

• Data Center design

• Mobile Data Center

• Water-Cooling

• Full range development from

ASICs to boards, blades,

racks

• Support for accelerators

• Open, scalable, reliable SW

• Development Environment

• Linux, OpenMPI, Lustre, Slurm

• Administration & monitoring

• HPC Systems Architecture

• Applications & Performance

• Energy Efficiency

• Data Management

• HPC Cloud

bullx supercomputer suite

Page 5: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

5 © Bull, 2013

Leading HPC technology with Bull

CURIE – 2011

1st PRACE PetaFlop-scale System

Rank #9

TERA100 – 2010

1st European PetaFlop-scale System

Rank #6

“C1” – 2013

1st Intel Xeon E5-2600 v2 System

Direct Liquid Cooling Technology

Page 6: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

6 © Bull, 2013

Energy (Electricity): a significant part of HPC budget

Page 7: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

7 © Bull, 2013

Industrial Electricity Prices in Europe

Source Eurostat – year 2010

Electricity Industrial Prices in UE

0,00

0,02

0,04

0,06

0,08

0,10

0,12

1998

2000

2002

2004

2006

2008

2010

€/k

Wh

Allemagne

Espagne

France

Italie

Pays-Bas

Pologne

Royaume-Uni

Norvège

http://epp.eurostat.ec.europa.eu/portal/page/portal/energy/data/main_tables#

Electricity prices

highly variable

across Europe

Avg 0.11€/kWh

Electricity prices

Rising steadily

CAGR 12%

Page 8: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

8 © Bull, 2013

Power to the datacenter

MEGAWATTS

Page 9: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

9 © Bull, 2013

bullx B700 series – Direct Liquid Cooling (DLC)

Direct Liquid Cooling DLC rack: Dual-pump unit (80 kW cooling capacity)

Power Supply Unit + UltraCapacitor opt.

Rack management (incl. Gigabit Ether.) +

5 chassis, each including:

– 18 dual-processor nodes

– Embedded 1st level InfiniBand switch

– Extra Embedded Gigabit Ethernet switch

Silent

Extra-easy maintenance

Optimized PUE (< 1.1)

rear Max config 2013 2014

Processors 6 PF 12 PF

Accelerators 20 PF 28 PF

Page 10: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

10 © Bull, 2013

Cooling & Power Usage Effectiveness (PUE)

Air-cooled Water-cooled door Direct-Liquid-cooling

A

/

C

We

desi

gned

a fl

exib

le c

old

plat

e to

coo

l the

two

CP

Us

and

the

IOH

We

choo

se a

n ad

apte

d te

chno

logy

with

man

ufac

ture

r :

- all

copp

er

- sam

e fix

atio

ns a

s fo

r hea

tsin

k (s

o th

e tu

be a

void

s th

e IO

H fi

xatio

n)

- onl

y ha

lf tu

be e

xcha

nge

heat

- inn

er d

iam

eter

: 4

mm

Eas

y” to

coo

l CP

U w

ith “h

ot” w

ater

:- 4

5°C

wat

er b

lade

inle

t =>

Tcas

e <7

0°C

Pro

toty

pe te

sted

– n

ot d

evel

oppe

d fo

r ch

assi

s te

st

2.0

1.5

1.0

Co-generation

20 kW/rack 30 kW/rack 70 kW/rack

Page 11: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

11 © Bull, 2013

With hot water cooled servers, water chillers are not required anymore

x N

42U rack

bullx DLC

bullx DLC

x 90 CN

up to 35°C

45°C

Direct Liquid Cooling Infrastructure

cold air

Cooling Tower

hot air

Water chiller

Page 12: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

12 © Bull, 2013

Where do all these Watts go ?

DDR3 (x8)

2x Xeon-EP

Fans HDD/SSD

Connector

to backplane

Page 13: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

13 © Bull, 2013

Node consumption varies with workload

Wrt Linpack (max -> 100%) Memory streaming 75% Irregular memory access 55% Iddle 25%

Using turbo is never energy efficient

Page 14: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

14 © Bull, 2013

Power Management

Accounting Users billed separately for CPU, IO, … and Energy

Keep compute center electricity bill within budget

Control power Avoid running over capacity

Allow for priority jobs

Adjust power consumption with electricity cost

Energy consumption / cost optimization Fine & precise power monitoring

Power data analysis

Control all system resources power

… enter software

Operating System bullx Linux - Red Hat Entreprise Linux (RHEL) - SUSE Linux Entreprise Server (SLES)

ApplicationManagement

DevelopmentEnvironment

bullx DE

ExecutionEnvironment

bullx BM

bullx MPI

SupercomputerManagement

Management Center

bullx MC

Software Manager

Monitoring& Control Manager

(Batch Management)

(Message Passing Interface)

DataManagement

Parallel File System

bullx PFS

Network File System

NFS

Local File System

1

SIR

Infrastructure Manager

(Lustre)

Page 15: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

15 © Bull, 2013

Bull - TU Dresden high frequency monitoring

IB Fabric / GigE Fabric

Cluster

SATA / SAS

Login (2) Admin (2)

FASS – Phase 1

GPU Cluster

32 Nodes

48x Kepler

Export (2)

Instrumented blades

SMP (2)

API

Power Monitor

for

Node Global Power

VR CPU

CPU1

VR CPU

CPU2

Power

Measurement

FPGA

i2C - 400Kb / VR

i2C - 400Kb

BMC(Baseboard Management Controller)

Operating System

(OS)

XBUS

50Mb

Ethernet 100Mb

Cluster

Manager

NODE

VR DIMM1

VR DIMM2

VR DIMM3

VR DIMM4

VR DIMM1

VR DIMM2

VR DIMM3

VR DIMM4

Freq: 500Hz - Error: 2%

FPGA

Page 16: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

16 © Bull, 2013

Energy efficient HPC systems …

Green systems Interest driven by energy cost and green attitude

Green systems start with Green components

– CPUs – Power only where/when needed, throttle frequency

– Memory, DDR4 saves 20-30%

– PSUs, optimize AC/DC, DC/DC conversion

– Interconnect…

Direct Liquid Cooling save on CAPEX (chillers) & OPEX (electricity)

Non-intrusive high definition power monitoring

Power: another parameter in system usage optimization

Energy aware batch scheduler

Page 17: Jean-Pierre Panziera Sept. 2013IB Fabric / GigE Fabric Cluster SATA / SAS Admin (2) Login (2) FASS –blades Phase 1 GPU Cluster 32 Nodes 48x Kepler Export (2) Instrumented (Ba SMP

17 © Bull, 2013