Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL...

32
Institut de calcul intensif et de stockage de masse Introduction to HPC at UCL Technical reminders and available equipment source code compiling optimized libraries: BLAS, LAPACK OpenMP MPI Job submission: SGE, Condor, SLURM CISM: working principles, management, access Machine room visit From algorithm to computer program: optimization and parallel code October 16th 2014 Damien François and Bernard Van Renterghem

Transcript of Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL...

Page 1: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Institut de calcul intensif et destockage de masse

Introduction to HPC at UCL

● Technical reminders and available equipment

• source code • compiling• optimized libraries: BLAS, LAPACK• OpenMP• MPI

● Job submission: SGE, Condor, SLURM

● CISM: working principles, management, access

● Machine room visit

● From algorithm to computer program: optimization and parallel code

October 16th 2014 Damien François and Bernard Van Renterghem

Page 2: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Cache memory

program execution = information exchange between CPU and RAM memory (program instructions and data)

RAM slow + sequential set of instructions > cache memory: instructions and/or data read by entire blocks transferred from RAM to cache memory

Cache L1, L2, L3

Page 3: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Clusters

Big number of standard elements (low cost)

Network performance critical

Low cost computer

Low cost computer

Low cost computer

Low cost computer

Low cost computer

Page 4: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Symmetric multi-processors

Page 5: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM

Equipment: servers

Page 6: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Equipment: servers

CISM servers

● Manneback● Green● (Ingrid CP3) ● « exotic » machines

UCL CECI servers

● Hmem● Lemaitre2

CECI servers

● Vega ULB● Hercules UNamur● Dragon1 Umons● Nic4 ULg

Page 7: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Equipment: clusters

● Charles Manneback (1894-1975) Georges Lemaitre's friend

● 66 nodes(528 core) from old green in 2008 but still powerfull● 21n Indus partition (336 core)● 9n Oban partition (288 core)● 8n Ivy partition (128 core)

●132 nodes, 1096+336 core, 3720+1344GB RAM, 1.7 Tflops

● Installed compilers: GCC, Intel, PGI

● OS: GNU/Linux Centos 6.5 os 2.6.32

● Batch system: SLURM 14.03.6

Manneback

Page 8: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Equipment: clusters

Manneback

Welcome to | |

__ `__ \ _` | __ \ __ \ _ \ __ \ _` | __| | / | | | ( | | | | | __/ | | ( | ( < _| _| _| \__,_| _| _| _| _| \___| _.__/ \__,_| \___| _|\_\

Charles Manneback Lemaitre fellow cluster

(GNU/Linux CentOS 6.5) front-end: 2x8Core E5-2650@2GHz/64GB RAM

mb007 1 node 8 core X5500/24GB RAM 2 GPU TeslaC1060mb008-019 12 nodes 8 core L5520/24GB Infiniband SDR 10Gbpsmb020-035 14 nodes 8 core L5420/16GBmb040 1 node 16 core E5-2660/64GB GPU Tesla M2090, XeonPhimb050-95 56 nodes 8 core L5420/16GB or 32GB (Old Green)mb101-121 21 nodes 16 core E5-2650/64GB (Indus Naps project)mb140-149 10 nodes 8 core L5420/32GB (Old Green)mb151-156 6 nodes 32 core AMD Opt6276/128GB (Oban)mb158-160 3 nodes 32 core E5-4640/128 or 256GB (Oban)mb161-168 8 nodes 16 core E5-E5-2650v2/64 GB (Ivy)

tot: 1096 core (+336) / 3720 (+1344) GB RAM.)

Page 9: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM

Equipment: clusters

Green

● Installed 2008 and is being deprecated.● Still 39 nodes, 312 core, 1248GB RAM, 3120 Gflops● 8 core Xeon L5420 2.5 GHz ● 32 GB RAM by node● 1 NFS server with 14 TB of storage (SATA disks) for /home● Installed compilers: GCC, Intel, PGI● OS: Scientific Linux 5.9 with ClusterVision OS (GNU/Linux 2.6.18)● Batch system: Sun Grid Engine 6.1

Page 10: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM

Equipment: clusters

Ingrid CP3 for CERN CMS/LHC(516 Cores)

● 1 front-end node (AMD Opteron 2.2 GHz) with 4 GB RAM● 17 nodes of 2 dual core CPU (Xeon 5160 3.0 GHz) with 4 GB RAM● 16 nodes of 2 quad core CPU (Xeon 5345 2.3 GHz) with 16 GB RAM● 12 nodes of dual CPU (AMD Opteron 248 2.2 GHz) with 3 GB RAM● 32 nodes of 2 quad core CPU (Xeon E5420 2.5 GHz) with 16 GB RAM● 8 nodes of 2 dual core CPU (Xeon 2.6 GHz) with 4 GB RAM● Storage: ingrid-fs 11TB + 6 x 11 TB (CMS) + 3x 36 TB (CMS)● OS: Scientific Linux CERN 4.7 (GNU/Linux 2.6.9)● Gigabit ethernet● Batch system: Condor

Page 11: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Equipment: exotic machines

Other peculiar machines● Lm9 : interactive matlab, TermoCalc, R,

2 6 core [email protected] GHz, 144 GB RAM

● Lmgpu (is Mb07): dual quad [email protected] (85 Gflops) + ( 2x) Tesla M1060 = 240 GPU core, 624 SP Gflops, 77 DP Gflops GPU.

● Mb40 : dual octa [email protected] +(2x) Tesla M2090 = 512 GPU core, 1332 SP Gflops, 666 DP Gflops GPU + Xeon Phi 61 [email protected] 1011 DP Gflops

● SCMS-SAS 1&2 : for SAS, STATA, R,...dual hexa [email protected] + Tesla C2075 = 512 GPU core, 127 Gflops + 1332 SP Gflops GPU

● LmPp001-003 : lemaitre2 PostProcessing Nvidia Quadro 4000 = 256 GPU core, 486 SP Gflops, 243 DP Gflops GPU.

Page 12: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Equipment CECI Clusters

● 16 Dell PowerEdge R815 + 1 HP + 3 Ttec

● 17x48 core AMD Opteron 6174(Magny-Cours) @2.2GHz

+ 3x8 core AMD Opteron 8222 @3GHz (24h partition)

● 2: 512, 7:256, 8:128, 3:128 GB RAM

● /scratch 3.2TB or 1.7TB

● Infiniband 40Gb/s

● SLURM batch queuing system

=

● Tot: 840 core, 4128 TB RAM, 31 TB /scratch, 11TB /home, 7468 GFlops

Hmem ( www.ceci-hpc.be )

Page 13: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM

● 112 HP DL380 with 2x6 core [email protected] 48GB RAM

● /scratch lustreFS 120TB, /tmp 325GB

● Infiniband 40Gb/s

● SLURM batch queuing system

=

● Tot: 1344 core, 5.25 TB RAM, 120 TB /scratch, 30TB /home, 13.6 TFlops

Lemaitre2 ( www.ceci-hpc.be )

Equipment CECI Clusters

Page 14: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM

ULB+Unamur,+UMons+UCL+ULg = CECI

Equipment CECI Clusters

See www.ceci-hpc.be/clusters.html

Page 15: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM To reduce computing time…

… improve your code

● choice of algorithm ● source code● optimized compiling● optimized libraries

… use parallel computation

● OpenMP (mostly on SMP machines) ● MPI

Page 16: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Source code

• Algorithm choice: volume of calculation increases with n, n x n,…? Stability ?

● indirect addressing expensive (pointers) ● fetching order of array elements (for optimal use of cache memory)● loop efficiency (get all uneccessary bits and pieces out of them)

• Programming language: FORTRAN, C, C++,…?

• Coding practise

Page 17: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Compiling

• The compiler…

• Optimization options: -01, -02, -03

• Different qualities of compilers !!

● translates an instruction list written in a high level language into a machine readable (binary) file [= the object file]

e. g. ifc –c myprog.f90 generates object file myprog.o● link binary object files to produce an executable file

e. g. ifc –o myprog module1.o libmath.a myprog.o generates the executable file (= program) myprog

Page 18: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Optimized libraries: BLAS

Basic Linear Algebra Subroutines

● set of optimized subroutines to handle vector x vector, matrix x vector, matrix x matrix operations (for real and complex numbers, single or double precision)

● the subrouines are optimized for a specific machine CPU/OS

● See http://www.netlig.org/blas● Example…

Page 19: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Optimized libraries: BLAS

● compiling from BLAS source: ifc –o mvm sgemv.f mvm.f

● compiling with pre-compiled BLAS library (optimized for Intel CPU):

ifc –o mvm mvm.f sblasd13d.a

real*8 matlxc(nl, nc)real*8 vectc(nc), result(nl)

call random_number(matlxc)call random_number(vectc)

do i=1,nl result(i)=0.0 do j=1,nc result(i)=result(i)+matlxc(i,j)*vectc(j) end doend do

call SGEMV('N',nl,nc,1.0d0,matlxc,nl,vectc,1,0.d0,result,1)

Page 20: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Optimized libraries: BLAS

Performance comparison of Intel and PGI FORTRAN compilers, for self-made code, BLAS code and pre-compiled optimized libraries (matrix 10,000 x 5,000)

Compiler Subroutine Options Mflpos

Intel (ifc) DO loop - O0 11

- O3 11

BLAS source - O0 42

O3 115

BLAS compiled - O0 120

O3 120

PGI (pgf90) DO loop - O0 11

- O3 11

BLAS source - O0 48

- O3 57

BLAS compiled -O0 116

-O3 119

Page 21: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Optimized libraries: LAPACK

• Linear Algebra Subroutines:

● linear equation system Ax=b● least square: min ||Ax-b||²● eigen value: Ax=λx, AX=λBx● for real or complex, single or double precision● includes all utility routines (LU factoring, Cholesky,…)

• Based on BLAS (don't depend on hardware, always optimized)

• See http://www.netlib.org/lapack

Page 22: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM OpenMP

• Open Memory Parallelism: standard language (compiler directives, functions, environment variables) for shared memory architectures (OpenMP 2.0)

• Principle: compiler directives > parallelism details are left to the compiler > fast implementation

!OMP PARALLEL DO

modèle fork and join…

DO I=1,1000 a(i)=b(i)*c(i)END DO…

Page 23: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM MPI environment

• MPI = Message Passing Interface (2.0)

• Principle: the program has full control over data exchange between nodes while distributing work and managing communication between nodes

• Widely used standard for clusters (but also exists for SMP boxes)

…REAL a(100) …C Process 0 sends, process 1 receives: if( myrank.eq.0 ) then call MPI_SEND(a,100,MPI_REAL,1,17,MPI_COMM_WORLD,ierr) else if ( myrank.eq.1 ) then call MPI_RECV(a,100,MPI_REAL,0,17,MPI_COMM_WORLD,status,ierr) endif …

Page 24: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Job submission

• Goal: one single task per CPU

• Principle: the user hands his program over to an automatic job management system, specifying his requirements (memory, architecture, number of cpus,…). When the requested resources become available, the job is dispatched and starts running.

• Several batch systems are used at CISM:

● Condor: (on Ingrid CP3 Tier2)● SGE: Sun Grid Engine (on green)● Slurm : on Manneback, Hmem, Lemaitre2 & CECI

Page 25: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Job submission

• Submission script examples…

• To submit your job: sbatch myscript

# SGE example#! /bin/sh

#$ -pe mpich 8#$ -l h_vmem=2G, num_proc=2#$ -M [email protected] < mydata....

Page 26: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM CISM: research environment

ELEN

TERM

SC

TOPO

ELIENAPS

RDGN

RECI

BSMA

IMAP

MOST

MEMA

BIB

COMU

LICR

ELEN

INGI

INMA

ELIC

INFM

CP3

FACM

NAPS

LOCI

GERU

PAMO

LSM

ECON

RSPO

Page 27: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM CISM

• Equipment and support available for any UCL (and CECI) member

• Equipments are acquired through projects

• Goals: joining forces to acquire and manage more powerful equipments

• Institut de Calcul Intensif et de Stockage de Masse:

● management committee composed of representatives of user's entities: debates and decides on strategies; chairman elected for four years

● offices in Mercator; machine rooms in Pythagore and Marc de Hemptinne

● daily management by technical computer team, under leadership of CISM Director (elected for four years)

Page 28: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM CISM management team

Thomas KeutgenDirecteur CISM

Luc SindicGestionnaire système de stockage de masse

Bernard Van RenterghemGestionnaire système &

support utilisateur

Damien FrançoisGestionnaire système &

support utilisateur

Page 29: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM CECI : Consortium FNRS

David ColignonLogisticien

• Consortium of HPC Equipments

● UCL: CISM – PCPM - FYNU● ULB : IIHE and SMN● UNamur : iSCF● UMons: CRMM and SCMN● Ulg: SEGI (NIC)

Bertrand ChenalLogisticien

Page 30: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Environmental challenges

Page 31: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Environmental challenges

• two 60 KW water chillers

Aquarium

• water cooling (rack based)

Page 32: Introduction to HPC at UCL - UCLouvain. 16th 2014 Introduction to High Performance Computing at UCL CISM Equipment: clusters Green Installed 2008 and is being deprecated. Still 39

Introduction to High Performance Computing at UCLOct. 16th 2014

CISM Environmental challenges

• total hosting capacity 120 KW

• electrical redundancy and 200 KVA UPS protection

• 5 m3 buffer tank• redundant

pumps, electrical feed through independent UPS

Aquarium