Using Kure and Killdevil Mark Reed Sandeep Sarangi ITS Research Computing.
-
Upload
alfred-oconnor -
Category
Documents
-
view
220 -
download
3
Transcript of Using Kure and Killdevil Mark Reed Sandeep Sarangi ITS Research Computing.
Using Kure and KilldevilUsing Kure and Killdevil
Mark ReedSandeep Sarangi
ITS Research Computing
2
Compute Clusters• Killdevil• Kure
Logging In File Spaces User Environment and
Applications, Compiling Job Management
OutlineOutline
3
LinksLinks
UNC Research Computing• http://its.unc.edu/research
Getting started Killdevil page• http://help.unc.edu/CCM3_031537
Killdevil FAQ• http://help.unc.edu/CCM3_031548
Getting started Kure page• http://help.unc.edu/ccm3_015682
What is a compute cluster?What exactly is Killdevil?
Kure?
What is a compute cluster?What exactly is Killdevil?
Kure?
5
What is a compute cluster?
What is a compute cluster?
Some Typical Components Compute Nodes Interconnect Shared File System Software Operating System (OS) Job Scheduler/Manager Mass Storage
6
Compute Cluster Advantages
Compute Cluster Advantages
fast interconnect, tightly coupled aggregated compute resources
• can run parallel jobs to access more compute power and more memory
large (scratch) file spaces installed software base scheduling and job management high availability data backup
7
Multi-Core ComputingMulti-Core Computing
The trend in High Performance Computing is towards multi-core or many core computing.
More cores at slower clock speeds for less heat
Dual and quad core processors are now common.
Soon 64+ core processors will be common
8
KureKure
A HPC/HTC research compute cluster in RC
Named after the beach in North Carolina
It’s pronounced like the Nobel prize winning physicist and chemist, Madame Curie
9
Kure Compute ClusterKure Compute Cluster
Heterogeneous Research Cluster
Hewlett Packard Blades 200+Compute Nodes,
mostly• Xeon 5560 2.8 GHz• Nehalem Microarchitecture• Dual socket, quad core• 48 GB memory• over 1800 cores• some higher memory
nodes
Infiniband 4x QDR
priority usage for patrons• Buy in is cheap
Storage• /netscr – 197 TB• Isilon space
10
Kure Cont.Kure Cont.
The original configuration of Kure was mostly homogeneous but it became increasingly heterogeneous as patrons added to it.
Most (non-patron) compute nodes are 48 GB but there are additional high memory nodes
3 nodes each with 192 GB of memory 2 nodes each with 96 GB of memory patron nodes with 72 GB of memory
11
Multi-Purpose Killdevil Cluster
Multi-Purpose Killdevil Cluster
High Performance Computing• Large parallel jobs, high speed
interconnect
High Throughput Computing (HTC)• high volume serial jobs
Large memory jobs• special nodes for extreme memory
GPGPU computing• computing on Nvidia processors
12
Killdevil NodesKilldevil Nodes
Three types of nodes:• compute nodes• large memory nodes• GPGPU nodes
13
Killdevil Cluster – Compute Nodes
Killdevil Cluster – Compute Nodes
Intel Xeon processors, Model X5670 Dual socket hex core (12 cores per
node) 2.93 GHz processors for each core 48 or 96 GB memory per node
14
Killdevil Cluster – Compute Nodes
Killdevil Cluster – Compute Nodes
Intel Xeon processors, Model E5-2670 Dual socket oct core (16 cores per
node) 2.60 GHz processors for each core 64 GB memory per node
15
Killdevil Cluster – Compute Nodes
Killdevil Cluster – Compute Nodes
68 nodes with 64 GB memory per node
604nodes with 48 GB memory per node
68 nodes with 96 GB memory total of 740 nodes with 9152 cores
• plus GPU and large memory nodes • So 774 nodes with 9600 cores total
16
Killdevil Extreme Memory Nodes
Killdevil Extreme Memory Nodes
2 nodes each with 1 TB of memory• extremely large shared memory node!
Intel Xeon Model X7550 32 cores per node 2.0 GHz processors Use the bigmem queue
17
Killdevil GPGPU Computing
Killdevil GPGPU Computing
General Purpose computing on Graphics Processing Units (GPGPU)
32 compute nodes are paired with 64 GPU’s in a 2:1 ratio• this is configurable and may vary
compute nodes are Intel Xeon X5650, 2.67 GHz, 12 cores, 48 GB memory nodes
GPUs are Nvidia Tesla (M2070), each with 448 compute cores
Use the gpu queue
18
Infiniband Connections
Infiniband Connections
Connection comes in single (SDR), double (DDR), and quad data rates (QDR). Now also FDR and EDR.• Killdevil is QDR.
Single data rate is 2.5 Gbit/s in each direction per link.
Links can be aggregated - 1x, 4x, 12x. • Killdevil is 4x.
Links use 8B/10B encoding —10 bits carry 8 bits of data — useful data transmission rate is four-fifths the raw rate. Thus single, double, and quad data rates carry 2, 4, or 8 Gbit/s respectively.
Data rate for Killdevil is 32 Gb/s or 4 GB/s (4x QDR).
19
Login to Killdevil/KureLogin to Killdevil/Kure
Use ssh to connect:• ssh killdevil.unc.edu• ssh kure.unc.edu
SSH Secure Shell with Windows• see http://shareware.unc.edu/software.html
For use with X-Windows Display:• ssh –X killdevil.unc.edu or ssh –X
kure.unc.edu• ssh –Y killdevil.unc.edu or ssh –Y
kure.unc.edu Off-campus users (i.e. domains outside
of unc.edu) must use VPN connection
File SpacesFile Spaces
21
Killdevil File SpacesKilldevil File Spaces
Home directories• /nas02/home/<a>/<b>/<onyen>
a = first letter of onyen, b = second letter of onyen
• hard limit of 15 GB
Scratch Space• NOT backed up• purged regularly (21 days or less)• run jobs with large output in these spaces
/netscr – 197 TB (tuned for small files) /lustre – 126 TB (tuned for large files)
Mass Storage• ~/ms
22
Kure File SpacesKure File Spaces
Home directories• /nas02/home/<a>/<b>/<onyen>
a = first letter of onyen, b = second letter of onyen
• hard limit of 15 GB
Scratch Space• NOT backed up• purged regularly (21 days or less)• run jobs with large output in these spaces
/netscr – 197 TB (tuned for small files)
Mass Storage• ~/ms
23
File System NotesFile System Notes
Note that the same home directory is mounted on Killdevil and Kure
Check your home file space usage with the quota command• quota –s (this uses more readable units)
Lustre file space in Killdevil is attached via Infiniband and may be faster
Best practice for jobs with large output is to run them in scratch space, tar and compress results, and store them in mass storage.
24
Mass StorageMass Storage
“To infinity … and beyond” - Buzz Lightyear
long term archival storage
access via ~/ms
looks like ordinary disk file system – data is actually stored on tape
“limitless” capacity
data is backed up
For storage only, not a work directory (i.e. don’t run jobs from here)
if you have many small files, use tar or zip to create a single file for better performance
Sign up for this service on onyen.unc.edu
User Environment and Applications, Compiling
Code
User Environment and Applications, Compiling
Code Modules
26
ModulesModules
The user environment is managed by modules. They provide a convenient way to access software applications
Modules modify the user environment by modifying and adding environment variables such as PATH or LD_LIBRARY_PATH
Typically you set these once and leave them Note there are two module settings, one for
your current environment and one to take affect on your next login (e.g. batch jobs running on compute nodes)
27
Common Module Commands
Common Module Commands
module avail• module avail apps
module help
Change Current Shell
module list module add module rm
Login version module initlist module initadd module initrm
More on modules see http://help.unc.edu/CCM3_006660
28
Compiling on Killdevil/KureCompiling on Killdevil/Kure
Serial Programming Suites for C, C++, Fortran90, Fortran77, etc Intel Compilers
• icc, icpc, ifort
GNU• gcc, g++, gfortran
Portland Group (PGI)• pgcc, pgCC, pgf90, pgf77
Generally speaking the Intel or PGI compilers will give slightly better performance
29
Parallel Jobs with MPIParallel Jobs with MPI
There are three implementations of the MPI standard installed on both systems:• mvapich• mvapich2• openmpi
Performance is similar for all three, all three run on the IB fabric. Mvapich is the default. Openmpi and mvapich2 have more the MPI-2 features implemented.
30
Compiling MPI programs
Compiling MPI programs
Use the MPI wrappers to compile your program• mpicc, mpiCC, mpif90, mpif77• the wrappers will find the appropriate
include files and libraries and then invoke the actual compiler
• for example, mpicc will invoke either gcc, pgcc or icc depending upon which module you have loaded
31
Compiling on Killdevil/KureCompiling on Killdevil/Kure
Parallel Programming MPI (see previous page) OpenMP
• Compiler flag: -openmp for Intel -fopenmp for GNU -mp for PGI
• Must set OMP_NUM_THREADS in submission script
Job Scheduling and ManagementJob Scheduling and Management
33
What does a Job Scheduler and batch system do?
What does a Job Scheduler and batch system do?
Manage Resources allocate user tasks to resource monitor tasks process control manage input and output report status, availability, etc enforce usage policies
34
Job Scheduling Systems
Job Scheduling Systems
Allocates compute nodes to job submissions based on user priority, requested resources, execution time, etc.
Many types of schedulers• Load Sharing Facility (LSF) – Used by
Killdevil/Kure• IBM LoadLeveler• Portable Batch System (PBS)• Sun Grid Engine (SGE)• Simple Linux Utility for Resource
Management (SLURM)
35
LSFLSF
All Research Computing clusters use LSF to do job scheduling and management
LSF (Load Sharing Facility) is a (licensed) product from Platform Computing (now owned by IBM)• Fairly distribute compute nodes among users• enforce usage policies for established queues
most common queues: int, now, week, month• RC uses Fair Share scheduling, not first come, first
served (FCFS) LSF commands typically start with the letter b
(as in batch), e.g. bsub, bqueues, bjobs, bhosts, …• see man pages for much more info!
36
Simplified view of LSFSimplified view of LSF
bsub –q week myjob
Login Node
Jobs Queued
job routed to queue
job_Jjob_Fmyjobjob_7
job dispatched to run on available host which satisfies job requirements
user logged in to login node submits job
37
Running Programs on Killdevil
Running Programs on Killdevil
Upon ssh to Killdevil/Kure, you are on the Login node.
Programs SHOULD NOT be run on Login node.
Submit programs to one of the many, many compute nodes.
Submit jobs using Load Sharing Facility (LSF) via the bsub command.
38
Common batch commands
Common batch commands
bsub - submit jobs bqueues – view info on defined queues
• bqueues –l week bkill – stop/cancel submitted job bjobs – view submitted jobs
• bjobs –u all bhist – job history
• bhist –l <jobID>
39
Common batch commands
Common batch commands
bhosts – status and resources of hosts (nodes)
bpeek – display output of running job Use man pages to get much more info!
• man bjobs
40
Submitting Jobs: bsub Command
Submitting Jobs: bsub Command
Submit Jobs - bsub• Run large jobs out of scratch space, smaller
jobs can run out of your home space
bsub [-bsub_opts] executable [-exec_opts] Common bsub options:
• –o <filename> –o out.%J
• -q <queue name> -q week
• -R “resource specification” -R “span[ptile=8]”
• -n <number of processes> used for parallel, MPI jobs
41
Two methods to submit jobs:
Two methods to submit jobs:
bsub example: submit the executable job, myexe, to the week queue and redirect output to the file out.<jobID> (default is to mail output)
Method 1: Command Line• bsub –q week –o out.%J myexe
Method 2: Create a file (details to follow) called, for example, myexe.bsub, and then submit that file. Note the redirect symbol, <• bsub < myexe.bsub
42
Method 2 cont.Method 2 cont.
The file you submitted will contain all the bsub options you want in it, so for this example myexe.bsub will look like this#BSUB –q week
#BSUB –o out.%J
myexe
This is actually a shell script so the top line could be the normal #!/bin/csh, etc and you can run any commands you would like.• if this doesn’t mean anything to you then
nevermind :)
43
Parallel Job exampleParallel Job example
Batch Command Line Method bsub –q week –o out.%J -n 64 mpirun
myParExe
Batch File Method bsub < myexe.bsub where myexe.bsub will look like this
#BSUB –q week
#BSUB –o out.%J
#BSUB –n 64
mpirun myParallelExe
44
Minor Killdevil caveats
Minor Killdevil caveats
Memory limits: Killdevil has a default memory limit of 4 GB for a job- if you need more than the default, use the “-M” LSF option:
bsub –q week –o out.%J –M 9 myExe
PI groups: On Killdevil when you submit a job make sure you use the correct PI group (only applicable if you belong to more than one PI group) by using the “-G” LSF option:
bsub –q week –G itsrc_grp myExe
45
Minor Killdevil caveats (cont’d)
Minor Killdevil caveats (cont’d)
Using the correct PI group is important for bookkeeping we do in regard to cluster usage by the PI groups
To check the PI groups to which you belong:
bugroup | grep <onyen>
46
Interactive JobsInteractive Jobs
To run long shell scripts on Kure, use int (interactive) queue
bsub –q int –Ip /bin/bash• This bsub command provides a prompt on
compute node• Can run program or shell script interactively
from compute node
on Killdevil use hour or day as needed• bsub –q hour –Ip /bin/bash
47
Specialty ScriptsSpecialty Scripts
There are specialty scripts provided on Kure for the user convenience.
Batch scripts• bmatlab, bsas, bstata
X-window scripts• xmatlab, xsas, xstata
Interactive scripts• imatlab, istata
Killdevil only provides the *matlab scripts listed above
48
MPI/OpenMP TrainingMPI/OpenMP Training
Courses are taught throughout year by Research Computing• http://learnit.unc.edu/workshops• http://help.unc.edu/CCM3_008194
See schedule for next course • MPI• OpenMP
49
Further Help with Killdevil/Kure
Further Help with Killdevil/Kure
More details can be found on the Getting Started help documents:• http://help.unc.edu/CCM3_031537 - Killdevil• http://help.unc.edu/ccm3_015682 - Kure
For assistance with Killdevil/Kure, please contact the ITS Research Computing group• Email: [email protected]• Phone: 919-962-HELP• Submit help ticket at http://help.unc.edu
For immediate assistance on a particular command, see the manual pages• man <command>