Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 ·...

19
Zellescher Weg 12 Trefftz-Bau/HRSK 151 Phone +49 351 - 463 - 39871 Guido Juckeland ([email protected]) Center for Information Services and High Performance Computing (ZIH) Introduction to High Performance Computing at ZIH Architecture of the PC Farm (Deimos)

Transcript of Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 ·...

Page 1: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Zellescher Weg 12

Trefftz-Bau/HRSK 151

Phone +49 351 - 463 - 39871

Guido Juckeland ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Introduction to High Performance Computing at ZIH

Architecture of the PC Farm (Deimos)

Page 2: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 2 - Guido Juckeland

Agenda

PC Farm Components

AMD Opteron Prozessors und Systems

Infiniband Networks

Page 3: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 3 - Guido Juckeland

PC Farm Components (Deimos)

Page 4: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 4 - Guido Juckeland

Linux Networx PC-Farm (Deimos)

1292 AMD Opteron x85 Dual-Core CPUs (2,6 GHz)

726 Compute nodes with 2, 4 oder 8 CPU Cores

Per core 2 GiByte main memory

2 Infiniband interconnects (MPI- and I/O-Fabric)

68 TByte SAN-Storage

Per node 70, 150, 290 GByte scratch-disk

OS: SuSE SLES 10

Batch system: LSF

Compiler: Pathscale, PGI, Intel, Gnu

3rd party applications: Ansys100, CFX, Fluent, Gaussian, LS-DYNA, Matlab, MSC,…

Page 5: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 5 - Guido Juckeland

Deimos - Partitions

2 Master Nodes

– Not accessible for users, PC-Farm management

4 Login Nodes

– 4 Core Nodes

– Accessible with DNS Round Robin under deimos.hrsk.tu-dresden.de

Single-, Dual- und Quad-Nodes

– 1, 2 or 4 CPUs

– 4, 8 or 16 GiByte main memory (24 Quads with 32 GiByte)

– 80, 160 or 300 GByte local disks

Setup in phase 1 and phase 2 nodes

– Identical hardware

– Differences in the connection to the MPI- and the I/O-Fabric (later)

Page 6: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 6 - Guido Juckeland

AMD Opteron Processors und Systems

Page 7: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 7 - Guido Juckeland

AMD Opteron CPU - Design

AMD Opteron x85 (2,6 GHz)

Memory controller on-chip(2 memory channels with 3.2 GiByte/s transfer bandwidth each)

Each Core 64 KiByte level 1 instruciton- and data cache

1 MiByte Level 2 Cache

64 Bit extension of IA-32 x86-architecture (x86-64, x64 oder EM64T)

Now also as quad core CPUs available

Page 8: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 8 - Guido Juckeland

AMD Opteron – Block diagram

Instr 'nTLB Level 1 Instr'n Cache

Fetch 2 - transit

Pick

Decode 1Decode 2

Decode 1Decode 2

Decode 1Decode 2

Pack Pack Pack

Decode Decode Decode

8-entryScheduler

8-entryScheduler

8-entryScheduler

ALU AGU ALU AGU ALU AGU FADD FMUL FMISC

36-entryScheduler

DataTLB Level 1 Data Cache ECC

2kBranchTargets

16kHistoryCounter

RAS&

Target Address

Level 2Cache

L2 ECCL2 Tags

L2 Tag ECC

System RequestQueue (SRQ)

Cross Bar(XBAR)

Memory Controller&

HyperTransport TM

v

Page 9: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 9 - Guido Juckeland

Deimos – Layout of a single-CPU node

AMDOpteron

185Mem

ory

(4 G

iByt

e)

Hypertransport

Peripheral devices(Infiniband, Ethernet, Disk)

Page 10: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 10 - Guido Juckeland

Deimos – Layout of a dual-CPU nodes

AMDOpteron

285

AMDOpteron

285Mem

ory

(4 G

iByt

e)

Mem

ory

(4 G

iByt

e)

Hypertransport

Hypertransport

Peripheral devices

(Infiniband, Ethernet, Festplatte)

Page 11: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 11 - Guido Juckeland

Deimos - Layout of a quad-CPU Node

AMDOpteron

885

AMDOpteron

885Mem

ory

(4 G

iByt

e)

Mem

ory

(4 G

iByt

e)

Hypertransport

Hypertransport

Peripheral devices

(Infiniband, Ethernet, Festplatte)

AMDOpteron

885

AMDOpteron

885Mem

ory

(4 G

iByt

e)

Mem

ory

(4 G

iByt

e)

Hypertransport

Hypertransport Hypertransport

Page 12: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 12 - Guido Juckeland

Infiniband Networks

Page 13: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 13 - Guido Juckeland

Basic Layout

Page 14: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 14 - Guido Juckeland

More complicated structures

Page 15: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 15 - Guido Juckeland

Infiniband-Stack

Page 16: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 16 - Guido Juckeland

Consequences for the user

No standard Linux networks (eth0,...)

No IP-addresses

No direct traffic monitoring possible

Very low MPI latency (about 5-15 μs)

High MPI bandwidth (up to 900 MiByte/s)

The batch system does not know about the state of the Infiniband fabric

Page 17: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 17 - Guido Juckeland

Deimos Infiniband-Layout (rough sketch)

Node

Node

Node

Node

Node

...Node

Node

Node

Node

Node

...

MPI Netzwerk

IO Netzwerk

Page 18: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 18 - Guido Juckeland

Deimos MPI-Fabric

+-------------------+ +--------------------+ +-------------------+| Switch 1 | | Switch 2 | | Switch 3 || | 30x | | 30x | || Rack 05 |-------| Rack 20 |-------| Rack 25 || | | | | || all Phase1 Nodes | | Phase2 Duals+Quads | | Phase 2 Singles |+-------------------+ +--------------------+ +-------------------+

3 288-Port Voltaire ISR 9288 IB-Switches with 4x Infiniband Ports

Page 19: Introduction to High Performance Computing at ZIHmlieber/slides/Architecture.pdf · 2010-06-22 · High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Slide 2 -

Slide 19 - Guido Juckeland

Deimos I/O Fabric

Tree structure with

– 1 192 Port Voltaire ISR 9288 IB-Switch with 4x Infiniband Ports (Rack 07)

– 36 24 Port Mellanox IB-Switch (4x) passive

Voltaire

Core-Switch

24 Port Mellanox

24 Port Mellanox

24 Port Mellanox

24 Port Mellanox

24 Port Mellanox

24 Port Mellanox

... ...

Phase 1 Phase 2