Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing...
-
Upload
lydia-bond -
Category
Documents
-
view
222 -
download
0
Transcript of Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing...
Batch Queuing Systems
The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch jobs. However, they differ in their implementation of the batch environment and their user commands.
Table 1 below provides a comparative list of command options to help users migrating from LSF (used on halem) to PBS (used on palm and discover).
Queuing SystemQueuing System LSFLSF(halem) (halem) PBS (palm and discover) PBS (palm and discover)
Resource Directive Sentinel
#BSUB #PBS
# of Nodes/Processors -n <#> (nodes) On palm: -l ncpus=<#> (Processors)
On discover: -l select=<#> (nodes)
Wall Clock Limit -W hh:mm -l walltime=hh:mm:ss
Queue -q <queue> -q <queue>
Email notification -B sends mail when job begins
-N sends job report when finished
-m b sends mail when job begins
-m e sends mail when job ends
Email address -u <email_address> -M <email_address>
Initial Directory (default = job submission directory) (default = $HOME)
Job Name -J <name> -N <name>
STDOUT -o <file_name> -o <file_name>
STDERR -e <file_name> -e <file_name>
STDERR & STDOUT to same file
(use -o without -e) -j oe (both to STDOUT)
-j eo (both to STDERR)
Project to charge -P <project> -W group_list=<project>
Table 1: Syntax for frequently used options
Queuing SystemQueuing System LSF on halemLSF on halem PBS on palm or discoverPBS on palm or discover
Submission bsub qsub
Deletion bkill qdel
Status bjobs qstat
Queue List bqueues -l qstat -Q
GUI monitor xpbsmon
Table 2: Frequently used job management commands (check man pages of each command for more information)
The following table compares commonly-used LSF and PBS commands to control and monitor the jobs.
Batch Job Management
Both LSF and PBS provide support for special environment variables, which simplify scripting and configuration of the batch jobs.
Queuing SystemQueuing System LSF on halemLSF on halem PBS on palm or discoverPBS on palm or discover
Processor List $LSB_HOSTS cat $PBS_NODEFILE
Directory of Submission $LS_SUBCWD $PBS_O_WORKDIR
Job Id $LSB_JOBID $PBS_JOBID
Table 3: Useful environmental variables
Environment Variables
Example Batch Scripts
The following simple LSF and PBS submission scripts compare how the batch systems request comparable resources and run the same parallel executable:
LSF example:#!/bin/csh
#BSUB -n 4
#BSUB -W 6:00
#BSUB -q special_b
#BSUB -J myJobName
#BSUB -o out.o%J
#BSUB -u [email protected]
#BSUB -P k1234
echo "Master Host: `hostname` "
echo "Node List: $LSB_HOSTS "
cd $LS_SUBCWD
prun -n 16 ./mpihello
To submit job, type:bsub < script_name
PBS example:#!/bin/csh
#PBS -l select=4:ncpus=4 <--- on discover or…
#PBS -l ncpus=16 <--- on palm
#PBS -l walltime=6:00:00
#PBS -q general
#PBS -N myJobName
#PBS -j oe
#PBS -me -M [email protected]
#PBS -W group_list=k1234
echo "Master Host: $PBS_O_HOST"
echo "Nodes:"; cat -n $PBS_NODEFILE
cd $PBS_O_WORKDIR
mpirun -np 16 ./mpihello
To submit job, type:qsub script_name
Interactive BatchBoth queuing systems can enter an interactive batch mode, commonly used for debugging, by using the -Is (LSF) or -I (PBS) option. Other options are the same as previously shown, but will be entered all on one line. Commands for the two different queuing systems are compared below:
LSF example (halem):% bsub -Is -Pk1234 -qspecial_b -W6:00 -n4 /usr/dlocal/bin/tcsh
When the requested processors are available, the interactive prompt will appear:bsub> cd $LS_SUBCWD
bsub> prun -n 16 ./mpihello
bsub> exit
PBS example (discover or palm):
on discover:% qsub -I -W group_list=k1234 -q general -l walltime=06:00:00,select=4:ncpus=4
or on palm:% qsub -I -W group_list=k1234 -q general -l walltime=06:00:00,ncpus=16
When the requested processors are available, the interactive prompt will appear:% cd $PBS_O_WORKDIR
% mpirun -np 16 ./mpihello
% exit