STG WW Blue Gene & HPC Benchmark Centers Tutorial: Introduction to the Blue Gene Facility in...
-
date post
22-Dec-2015 -
Category
Documents
-
view
225 -
download
3
Transcript of STG WW Blue Gene & HPC Benchmark Centers Tutorial: Introduction to the Blue Gene Facility in...
STG WW STG WW Blue GeneBlue Gene & HPC & HPC Benchmark CentersBenchmark Centers
Tutorial: Introduction to the Blue Gene Tutorial: Introduction to the Blue Gene Facility in Rochester, MinnesotaFacility in Rochester, Minnesota
Carlos P SosaCarlos P SosaChemistry and Life Sciences GroupChemistry and Life Sciences Group
Advanced Systems Software Advanced Systems Software DevelopmentDevelopment
Rochester, MNRochester, MN
2
Advanced Systems Software Dev.
Rochester Blue Gene Center Rochester Blue Gene Center TeamTeam
Cindy Mestad, Certified PMP®, STG WW Blue Gene & HPC p Benchmark Centers
Steve M Westerbeck, System Administrator, STG WW Blue Gene & HPC p Benchmark Centers
3
Advanced Systems Software Dev.
Chemistry Life Sciences Applications Chemistry Life Sciences Applications TeamTeam
Carlos P Sosa, Chemistry and Life Sciences Applications, Advanced Systems Software Development
4
Advanced Systems Software Dev.
PrefacePreface
This tutorial provides a brief introduction to the environment for the Blue Gene IBM Facilities in Rochester, Minnesota
Customers should be mindful of their own security issues
The following points should be considered: ► Sharing of userids is not an accepted practice in order to
maintain proper authentication controls
► Additional encryption of data and source code on the filesystem is encouraged
► Housekeeping procedures on your assigned frontend node and filesystem is recommended
► Report any security breaches or concerns to the Rochester Blue Gene System Administration
► Changing permissions on user generated files for resource sharing is the responsibility of the individual user
► Filesystem cleanup at the end of the engagement is the responsibility of the customer
5
Advanced Systems Software Dev.
1. Blue Gene Hardware Overview1. Blue Gene Hardware Overview
6
Advanced Systems Software Dev.
Blue Gene System ModularityBlue Gene System Modularity
How is BG/P How is BG/P Configured?Configured?
Service & Front End (Login) Nodes
10GbE Functional NetworkFile
Servers
1GbE Service Network
SLES10DB2XLF
XLC/C++GPFSESSL
TWS LL
Storage Subsystem
7
Advanced Systems Software Dev.
HierarchyHierarchy
Compute nodes dedicated to running user applications, and almost nothing else – simple compute node kernel (CNK)
I/O nodes run Linux and provide a more complete range of OS services – files, sockets, process launch, debugging, and terminationService node performs system management services (e.g., heart beating, monitoring errors) – largely transparent to application/system softwareLooking inside Blue Gene
8
Advanced Systems Software Dev.
Blue Gene EnvironmentBlue Gene Environment
Service Node
SystemConsole
Scheduler
DB2 CMCS
FrontendNodes
FileServers
Collective Network Pset 1151
I/O Node 1151I/O Node 0 I/O Node 1151C-Node 0 I/O Node 1151I/O Node 1151
Collective Network Pset 0
.
.
.
.
.
.
.
.
Linux
fs client
ciod
CNK
MPI
app
CNK
MPI
app
torus
Functional
10 GbpsEthernet
ControlGigabit
Ethernet
iCon+Palomino
I2C
JTAG
I/O Node 1151C-Node 0
CNK
fs client
ciod
I/O Node 1151C-Node 63
CNK
fs client
ciod
I/O Node 1151I/O Node 1151
Linux
fs client
ciod
SoC13.6 GF/s
8 MB EDRAM
Compute Card
1 SoC, 40 DRAMs13.6 GF/s2 GB DDR
Rack32 Node Cards
13.9 TF/s2 TB
SystemUp to 256 RacksUp to 3.5 PF/sUp to 512 TB
Cabled8x8x16
Node Card32 Compute Cards
0-1 I/O cards435.2 GF/s
64 GB
IBM System Blue Gene/PIBM System Blue Gene/P®®
System-on-Chip (SoC)Quad PowerPC 450 w/ Double FPU
Memory Controller w/ ECCL2/L3 CacheDMA & PMU
Torus NetworkCollective Network
Global Barrier Network10GbE Control Network
JTAG Monitor
BG/P Applications Specific BG/P Applications Specific Integrated Circuit (ASIC) DiagramIntegrated Circuit (ASIC) Diagram
Virtual Node Mode
Previously called Virtual Node ModeAll four cores run one MPI process eachNo threadingMemory / MPI process = ¼ node memoryMPI programming model
Dual Node ModeTwo cores run one MPI process eachEach process may spawn one thread on core not used by other processMemory / MPI process = ½ node memoryHybrid MPI/OpenMP programming model
SMP Node ModeOne core runs one MPI processProcess may spawn threads on each of the other coresMemory / MPI process = full node memoryHybrid MPI/OpenMP programming model
M
P
M
P
M
P
Memory address space
M
Co
re 0
P
Application
Co
re 1
Co
re 2
Co
re 3
Application
M
P
T
M
P
TCo
re 0
Co
re 1
Co
re 2
Co
re 3
Memory address space
CPU2 CPU3
Application
M
P
T T TCo
re 0
Co
re 1
Co
re 2
Co
re 3
Memory address space
What’s new?Blue Gene/P Job Modes Allow Flexible Use of Blue Gene/P Job Modes Allow Flexible Use of
Node MemoryNode Memory
Blue Gene Integrated NetworksBlue Gene Integrated Networks
– Torus– Interconnect to all
compute nodes– Torus network is used– Point-to-point
communication– Collective
– Interconnects compute and I/O nodes
– One-to-all broadcast functionality
– Reduction operations functionality
– Barrier– Compute and I/O nodes– Low latency barrier
across system (< 1usec for 72 rack)
– Used to synchronize timebases
– 10Gb Functional Ethernet
– I/O nodes only
– 1Gb Private Control Ethernet
– Provides JTAG, i2c, etc, access to hardware. Accessible only from Service Node system
– Boot, monitoring, and diagnostics
– Clock network
– Single clock source for all racks
HPC Software Tools for Blue Gene HPC Software Tools for Blue Gene IBM Software Stack
XL (FORTRAN, C, and C++) compilers Externals preserved Optimized for specific BG functions OpenMP support
LoadLeveler scheduler Same externals for job submission and system
query functions Backfill scheduling to achieve maximum
system utilization
GPFS parallel file system Provides high performance file access, as in
current pSeries and xSeries clusters Runs on I/O nodes and disk servers
ESSL/MASSV libraries Optimization library and intrinsics for better
application performance Serial Static Library supporting 32-bit
applications Callable from FORTRAN, C, and C++
MPI library Message passing interface library, based on
MPICH2, tuned for the Blue Gene architecture
Other Software Support Parallel File Systems
Lustre at LLNL, PVFS2 at ANL
Job Schedulers SLURM at LLNL, Cobalt at ANL Altair PBS Pro, Platform LSF (for BG/L only) Condor HTC (porting for BG/P)
Parallel Debugger Etnus TotalView (for BG/L as of now, porting for
BG/P) Allinea DDT and OPT (porting for BG/P)
Libraries FFT Library - Tuned functions by TU-Vienna VNI (porting for BG/P)
Performance Tools HPC Toolkit: MP_Profiler, Xprofiler, HPM,
PeekPerf, PAPI Tau, Paraver, Kojak
HTCHTC
HPC1024 nodes
HTC VNM512 nodes
HPCHPC
App1
App1
App1
App1
App1
App1
App1
App1
App1
App1
App1
App1
App1
App1
App1
App1
App6
App6
App6
App6App7
App7
App7
App8
App8
App8
App8 App8
App9
App9
App9
App9
HTC DM256 nodes
HTC SMP256 nodes
BG/P with HTC looks like a cluster for serial and parallel apps Hybrid environment … standard HPC (MPI) apps plus now HTC
apps Enables a new class of workloads that use many single-node
jobs Easy administration using web-based Navigator
High-Throughput Computing (HTC) High-Throughput Computing (HTC) modes on Blue Gene/Pmodes on Blue Gene/P
15
Advanced Systems Software Dev.
2. IBM Rochester Center Overview 2. IBM Rochester Center Overview
16
Advanced Systems Software Dev.
Rochester Blue Gene InfrastructureRochester Blue Gene Infrastructure
17
Advanced Systems Software Dev.
Shared GPFS FilesystemShared GPFS Filesystem
18
Advanced Systems Software Dev.
Understanding Performance on Blue Gene/P
Theoretical floating-point performance►1 fpmadd per cycle
►Total of 4 floating-point operations per cycle
►4 floating-point operations/cycle x 850 cycle/s x 106
= 3,400 x 106 = 3.4 GFlop/s per core
►Peak performance = 13.6 GFlop/s per node ( 4 cores )
19
Advanced Systems Software Dev.
Two Generations: BG/L and BG/PTwo Generations: BG/L and BG/P
20
Advanced Systems Software Dev.
3. How to Access Your Frontend 3. How to Access Your Frontend NodeNode
21
Advanced Systems Software Dev.
How to Login to the FrontendHow to Login to the Frontend
bcssh.rochester.ibm.com
22
Advanced Systems Software Dev.
GatewayGateway
ssh to your assigned front-endgateway
23
Advanced Systems Software Dev.
Your front-endYour front-end
gateway
24
Advanced Systems Software Dev.
Transferring FilesTransferring Files
Transferring Files into the Rochester IBM Blue Gene Center
WinSCP
25
Advanced Systems Software Dev.
Transferring to the Front-endTransferring to the Front-end
Use scp
bcssh:/codhome/myaccount $ scp conf_gen.cpp frontend-1:~
conf_gen.cpp 100% 46KB 45.8KB/s 00:00
26
Advanced Systems Software Dev.
Current Disk Space LimtsCurrent Disk Space Limts
bcssh gateway:►/codhome/userid directories on bcssh are
limited to 300GB (shared, no quota)
– Used for transferring files in and out of the environment
Frontend node:►/home directories have 10GB for all users,
no quotas
►The /gpfs file system is 400GB in size, there are no quotas as the file space is shared between all users on that frontend node
27
Advanced Systems Software Dev.
4. Compilers for Blue Gene4. Compilers for Blue Gene
28
Advanced Systems Software Dev.
IBM CompilersIBM Compilers
Compilers for Blue Gene are located in the front-end (/opt/ibmcmp)
Fortran:► /opt/ibmcmp/xlf/bg/11.1/bin/bgxlf
► /opt/ibmcmp/xlf/bg/11.1/bin/bgxlf90
► /opt/ibmcmp/xlf/bg/11.1/bin/bgxlf95 C:
► /opt/ibmcmp/vac/bg/9.0/bin/bgxlc C++:
► /opt/ibmcmp/vacpp/bg/9.0/bin/bgxlC
29
Advanced Systems Software Dev.
GNU CompilersGNU Compilers
The Standard GNU compilers and libraries which are also located on the frontend node will NOT produce Blue Gene compatible binary code. The standard GNU compilers can only be used for utility or frontend code development that your application may require.
GNU compilers (Fortran, C, C++) for Blue Gene are located in (/opt/blrts-gnu/ )
Fortran: ► /opt/gnu/powerpc-bgp-linux-gfortran
C: ► /opt/gnu/powerpc-bgp-linux-gcc
C++: ► /opt/gnu/powerpc-bgp-linux-g++
It is recommended not to use GNU compiler for Blue Gene as the IBM XL compilers offer significantly higher performance. The GNU compilers do offer more flexible support for things like inline assembler.
30
Advanced Systems Software Dev.
5. MPI on Blue Gene5. MPI on Blue Gene
31
Advanced Systems Software Dev.
MPI Library LocationMPI Library Location
MPI implementation on Blue Gene is based on MPICH-2 from Argonne National Laboratory
Include files mpi.h and mpif.h are at the location:
►-I/bgsys/drivers/ppcfloor/comm/include
32
Advanced Systems Software Dev.
6 & 7. Compilation and Execution on 6 & 7. Compilation and Execution on Blue GeneBlue Gene
33
Advanced Systems Software Dev.
Copying Executables and InputCopying Executables and Input
Step 1 : Copy Input files and executables to a shared directory► Place data and executables in a directory under /gpfs
Example :
$cd /gpfs/fs2/frontend-1$mkdir myaccount$cp ~myaccount/sander /gpfs/fs2/frontend-1/myaccount$cp ~myaccount/input.tar /gpfs/fs2/frontend-1/myaccount
34
Advanced Systems Software Dev.
Compiling on Blue Gene: CCompiling on Blue Gene: C
$/gpfs/fs2/frontend-11/myaccount/hello:0>make -f make.hellompixlc_r -O3 -qarch=450 -qtune=450 hello.c -o hello
>cat make.helloXL_CC = mpixlc_rOBJ = helloSRC = hello.cFLAGS = -O3 -qarch=450 -qtune=450LIBS =
$(OBJ): $(SRC) ${XL_CC} $(FLAGS) $(SRC) -o $(OBJ) $(LIBS)
clean: rm *.o hello
35
Advanced Systems Software Dev.
Hello World: CHello World: C
>cat hello.c#include <stdio.h> /* Headers */#include "mpi.h"
main(int argc, char **argv) /* Function main */{ int rank, size, tag, rc, i; MPI_Status status; char message[20];
rc = MPI_Init(&argc, &argv); rc = MPI_Comm_size(MPI_COMM_WORLD, &size); rc = MPI_Comm_rank(MPI_COMM_WORLD, &rank); tag = 100;
if(rank == 0) { strcpy(message, "Hello, world"); for (i=1; i<size; i++) rc = MPI_Send(message, 13, MPI_CHAR, i, tag, MPI_COMM_WORLD); } else rc = MPI_Recv(message, 13, MPI_CHAR, 0, tag, MPI_COMM_WORLD,
&status);
printf( "node %d : %.13s\n", rank,message); rc = MPI_Finalize();}
36
Advanced Systems Software Dev.
Compiling on Blue Gene: C++Compiling on Blue Gene: C++
>cat make.hello
•XL_CC = mpixlcxx_r•OBJ = hello•SRC = hello.cc•FLAGS = -O3 -qarch=450 -qtune=450•LIBS =
•$(OBJ): $(SRC)• ${XL_CC} $(FLAGS) $(SRC) -o $(OBJ) $(LIBS)
•clean:• rm *.o hello
37
Advanced Systems Software Dev.
Hello World: C++Hello World: C++
cat hello.cc// Include the MPI version 2 C++ bindings:#include <mpi.h>#include <iostream>#include <string.h>
using namespace std;
intmain(int argc, char* argv[]){MPI::Init(argc, argv);
int rank = MPI::COMM_WORLD.Get_rank();
int size = MPI::COMM_WORLD.Get_size();
char name[MPI_MAX_PROCESSOR_NAME]; int len; memset(name,0,MPI_MAX_PROCESSOR_NAME); MPI::Get_processor_name(name,len); memset(name+len,0,MPI_MAX_PROCESSOR_NAME-len);
cout << "hello_parallel.cc: Number of tasks="<<size<<" My rank=" << rank << " My name="<<name<<"."<<endl;
MPI::Finalize(); return 0;}
https://spaces.umbc.edu/pages/viewpage.action?pageId=5245461#C%2B%2BHelloWorldProgram-parallelhttps://spaces.umbc.edu/pages/viewpage.action?pageId=5245461#C%2B%2BHelloWorldProgram-parallel
38
Advanced Systems Software Dev.
Running Programs (applications) on Running Programs (applications) on Blue GeneBlue Gene
Job running is managed via Loadleveler
►LoadLeveler is a job scheduler written by IBM, to control scheduling of batch jobs
►mpirun is invoked via loadleveler
39
Advanced Systems Software Dev.
Script to Emulate Syntax of Script to Emulate Syntax of mpirunmpirun
40
Advanced Systems Software Dev.
llrunllrun
pts/0:>:0>llrun
41
Advanced Systems Software Dev.
mpirun
Step 2 : job submission using mpirun► User can use “mpirun” to submit jobs.
► The Blue Gene mpirun is located in /usr/bin/mpirun Typical use of mpirun :
► mpirun -np <# of processes> –partition <block id> -cwd `pwd` -exe <executable>
Where:
-np : Number of processors to be used. Must fit in available partition-partition : A partition from Blue Gene rack on which a given executable will execute,
eg., R000.-cwd : The current working directory and is generally used to specify where any input
and output files are located.-exe : The actual binary program which user wish to execute.
Example :
mpirun –np 32 –partition R000 -cwd /gpfs/fs2/frontend-11/myaccount -exe /gpfs/fs2/frontend-11/myaccount/hello
42
Advanced Systems Software Dev.
mpirun Selected Options
Selected options:► -args : List of arguments to the executables in double quotes
► -env : List of environment variables in double quotes. “VARIABLE=value”
► -mode : SMP or VN or DUAL
For more details perform following operation on command prompt :
mpirun -h
43
Advanced Systems Software Dev.
mpirun Selected Example in an sh mpirun Selected Example in an sh ScriptScript#!/bin/sh
# --------- User options start here --------------------
MPIRUN="mpirun"MPIOPT="-np 32"PARTITION="-partition R000_J203_128"WDIR="-cwd /FS1/myaccount/amber/IIsc/b4amber_mod/data1_32"SANDER="-exe /FS1/myaccount/amber/exe/sander_bob_noBTREE"
time_ends=1600 # till many pico seconds after 150ps# ---------- User options end here ---------------------...$MPIRUN $MPIOPT $PARTITION -args "-O -i trna.md.in -o trna.${FRST}_${LAST}.out -p trna.prm.top -c trna.${PRIV}_${FRST}.res -r trna.${FRST}_${LAST}.res -x trna.${FRST}_${LAST}.crd -v trna.${FRST}_${LAST}.vel -e trna.${FRST}_${LAST}.en -inf trna.${FRST}_${LAST}.info" $WDIR $SANDER
44
Advanced Systems Software Dev.
Invoking Invoking llrunllrun
pts/0:/gpfs/fs2/frontend-11/myaccount/test:0>llrun -np 32 -cwd /gpfs/fs2/frontend-11/myaccount/test -exe /gpfs/fs2/frontend-11/myaccount/test/hello
Output:Submitted job: frontend-11.rchland.ibm.com.1675Command file: llrun.myaccount.090704.1040.cmdOutput stdout: myaccount.frontend-11.$(jobid).out stderr: myaccount.frontend-11.$(jobid).err path: /gpfs/fs2/frontend-11/myaccount/test/
Files created:myaccount@frontend-11pts/0:/gpfs/fs2/frontend-11/myaccount/test:1>lsmyaccount.frontend-11.1675.err myaccount.frontend-11.1675.out llrun.myaccount.090704.1040.cmd
45
Advanced Systems Software Dev.
llrunllrun “cmd” file “cmd” file
# @ job_type = bluegene# @ requirements = (Machine == "$(host)")# @ class = medium# @ job_name = myaccount.frontend-11# @ comment = "llrun generated jobfile"# @ error = myaccount.frontend-11.$(jobid).err# @ output = myaccount.frontend-11.$(jobid).out# @ environment = COPY_ALL;# @ wall_clock_limit = 00:30:00# @ notification = always# @ notify_user =# @ bg_connection = prefer_torus# @ bg_size = 32# @ initialdir=/gpfs/fs2/frontend-11/myaccount/test# @ queue /bgsys/drivers/ppcfloor/bin/mpirun -np 32 -cwd /gpfs/fs2/frontend-11/myaccount/test -exe /gpfs/fs2/frontend-11/myaccount/test/hello
46
Advanced Systems Software Dev.
llll Command Script Command Script
pts/0:/gpfs/fs2/frontend-11/myaccount/namd_test:0>cat llrun_namd.cmd
# @ job_type = bluegene# @ requirements = (Machine == "$(host)")# @ class = medium# @ job_name = myaccount.frontend-11# @ comment = "LoadLeveler llrun script"# @ error = $(job_name).$(jobid).err# @ output = $(job_name).$(jobid).out# @ environment = COPY_ALL;# @ wall_clock_limit = 00:60:00# @ notification = never# @ notify_user =# @ bg_connection = prefer_torus# @ bg_size = 256# @ initialdir=/gpfs/fs2/frontend-11/myaccount/namd_test# @ queue/bgsys/drivers/ppcfloor/bin/mpirun -np 256 -verbose 1 -mode SMP -env "BG_MAPPING=TXYZ" -cwd /gpfs/fs2/frontend-11/myaccount/namd_test -exe ./namd2 -args "apoa1.namd"
LL sectionLL section
mpirun section: specific to the applicationmpirun section: specific to the application
47
Advanced Systems Software Dev.
mpirunmpirun Standalone Versus Standalone Versus mpirunmpirun in LL in LL EnvironmentEnvironment
•Comparison between mpirun and Loadleveler llsubmit command commandComparison between mpirun and Loadleveler llsubmit command command
job_type and requirements tags must ALWAYS be specified as listed above
If the above command file listing were contained in a file named my_job.cmd, then the job would then be submitted to the LoadLeveler queue using llsubmit my_job.cmd.
48
Advanced Systems Software Dev.
Blue Gene – Monitoring Jobs: Blue Gene – Monitoring Jobs: bgstatusbgstatus
Monitor Status of job executing on Blue Gene► $bgstatus
49
Advanced Systems Software Dev.
Blue Gene – Monitoring Jobs: Blue Gene – Monitoring Jobs: lljobqlljobq
50
Advanced Systems Software Dev.
Avoid Firewall inactivity timeout Avoid Firewall inactivity timeout issuesissues
Before:
$screen <enter>
After:
$screen -r <enter>
More information:
http://www.kuro5hin.org/story/2004/3/9/16838/14935
51
Advanced Systems Software Dev.
Appendix: Blue Gene Specific LL Keywords - Appendix: Blue Gene Specific LL Keywords - 11
52
Advanced Systems Software Dev.
Appendix: Blue Gene Specific LL Keywords - Appendix: Blue Gene Specific LL Keywords - 22
53
Advanced Systems Software Dev.
Appendix: Blue Gene Specific LL Keywords - Appendix: Blue Gene Specific LL Keywords - 33
54
Advanced Systems Software Dev.
Appendix: Understanding Job Status - Appendix: Understanding Job Status - 11
55
Advanced Systems Software Dev.
Appendix: Understanding Job Status - Appendix: Understanding Job Status - 22
56
Advanced Systems Software Dev.
Appendix: Understanding Job Status - Appendix: Understanding Job Status - 33
57
Advanced Systems Software Dev.
Appendix: Hardware Naming Appendix: Hardware Naming Convention – 1 Convention – 1
http://www.redbooks.ibm.com/redbooks/SG247417/wwhelp/wwhimpl/js/html/wwhelp.htmhttp://www.redbooks.ibm.com/redbooks/SG247417/wwhelp/wwhimpl/js/html/wwhelp.htm
58
Advanced Systems Software Dev.
Appendix: Hardware Naming Appendix: Hardware Naming Convention – 2Convention – 2
http://www.redbooks.ibm.com/redbooks/SG247417/wwhelp/wwhimpl/js/html/wwhelp.htmhttp://www.redbooks.ibm.com/redbooks/SG247417/wwhelp/wwhimpl/js/html/wwhelp.htm
59
Advanced Systems Software Dev.
Appendix: Understanding Job Status - Appendix: Understanding Job Status - 44
60
Advanced Systems Software Dev.
Help?Help?
Where to submit questions related to the Rochester IBm Center?
61
Advanced Systems Software Dev.
References: Blue Gene/LReferences: Blue Gene/L1. Blue Gene/L: System Administration, SG24-7178-03 Redbooks,
published 27 October 2006, last updated 30 October 20062. Blue Gene/L: Safety Considerations, REDP-3983-01 Redpapers,
published 29 June 20063. Blue Gene/L: Hardware Overview and Planning, SG24-6796-02
Redbooks, published 11 August 20064. Blue Gene/L: Application Development, SG24-7179-03
Redbooks, published 27 October 2006, last updated 18 January 2007
5. Unfolding the IBM eServer Blue Gene Solution, SG24-6686-00 Redbooks, published 20 September 2005, last updated 1 February 2006
6. GPFS Multicluster with the IBM System Blue Gene Solution and eHPS Clusters, REDP-4168-00 Redpapers, published 24 October 2006
7. Blue Gene/L: Performance Analysis Tools, SG24-7278-00 Redbooks, published 18 July 2006
8. IBM System Blue Gene Solution Problem Determination Guide, SG24-7211-00 Redbooks, published 11 October 2006
http://www.redbooks.ibm.com/