Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing...

6
Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch jobs. However, they differ in their implementation of the batch environment and their user commands. Table 1 below provides a comparative list of command options to help users migrating from LSF (used on halem) to PBS (used on palm and discover).

Transcript of Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing...

Page 1: Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.

Batch Queuing Systems

The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch jobs. However, they differ in their implementation of the batch environment and their user commands.

Table 1 below provides a comparative list of command options to help users migrating from LSF (used on halem) to PBS (used on palm and discover).

Page 2: Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.

Queuing SystemQueuing System LSFLSF(halem) (halem) PBS (palm and discover) PBS (palm and discover)

Resource Directive Sentinel

#BSUB #PBS

# of Nodes/Processors -n <#> (nodes) On palm: -l ncpus=<#> (Processors)

On discover: -l select=<#> (nodes)

Wall Clock Limit -W hh:mm -l walltime=hh:mm:ss

Queue -q <queue> -q <queue>

Email notification -B sends mail when job begins

-N sends job report when finished

-m b sends mail when job begins

-m e sends mail when job ends

Email address -u <email_address> -M <email_address>

Initial Directory (default = job submission directory) (default = $HOME)

Job Name -J <name> -N <name>

STDOUT -o <file_name> -o <file_name>

STDERR -e <file_name> -e <file_name>

STDERR & STDOUT to same file

(use -o without -e) -j oe (both to STDOUT)

-j eo (both to STDERR)

Project to charge -P <project> -W group_list=<project>

Table 1: Syntax for frequently used options

Page 3: Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.

Queuing SystemQueuing System LSF on halemLSF on halem PBS on palm or discoverPBS on palm or discover

Submission bsub qsub

Deletion bkill qdel

Status bjobs qstat

Queue List bqueues -l qstat -Q

GUI monitor xpbsmon

Table 2: Frequently used job management commands (check man pages of each command for more information)

The following table compares commonly-used LSF and PBS commands to control and monitor the jobs.

Batch Job Management

Page 4: Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.

Both LSF and PBS provide support for special environment variables, which simplify scripting and configuration of the batch jobs.

Queuing SystemQueuing System LSF on halemLSF on halem PBS on palm or discoverPBS on palm or discover

Processor List $LSB_HOSTS cat $PBS_NODEFILE

Directory of Submission $LS_SUBCWD $PBS_O_WORKDIR

Job Id $LSB_JOBID $PBS_JOBID

Table 3: Useful environmental variables

Environment Variables

Page 5: Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.

Example Batch Scripts

The following simple LSF and PBS submission scripts compare how the batch systems request comparable resources and run the same parallel executable:

LSF example:#!/bin/csh

#BSUB -n 4

#BSUB -W 6:00

#BSUB -q special_b

#BSUB -J myJobName

#BSUB -o out.o%J

#BSUB -u [email protected]

#BSUB -P k1234

echo "Master Host: `hostname` "

echo "Node List: $LSB_HOSTS "

cd $LS_SUBCWD

prun -n 16 ./mpihello

To submit job, type:bsub < script_name

PBS example:#!/bin/csh

#PBS -l select=4:ncpus=4 <--- on discover or…

#PBS -l ncpus=16 <--- on palm

#PBS -l walltime=6:00:00

#PBS -q general

#PBS -N myJobName

#PBS -j oe

#PBS -me -M [email protected]

#PBS -W group_list=k1234

echo "Master Host: $PBS_O_HOST"

echo "Nodes:"; cat -n $PBS_NODEFILE

cd $PBS_O_WORKDIR

mpirun -np 16 ./mpihello

To submit job, type:qsub script_name

Page 6: Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.

Interactive BatchBoth queuing systems can enter an interactive batch mode, commonly used for debugging, by using the -Is (LSF) or -I (PBS) option. Other options are the same as previously shown, but will be entered all on one line. Commands for the two different queuing systems are compared below:

LSF example (halem):% bsub -Is -Pk1234 -qspecial_b -W6:00 -n4 /usr/dlocal/bin/tcsh

When the requested processors are available, the interactive prompt will appear:bsub> cd $LS_SUBCWD

bsub> prun -n 16 ./mpihello

bsub> exit

PBS example (discover or palm):

on discover:% qsub -I -W group_list=k1234 -q general -l walltime=06:00:00,select=4:ncpus=4

or on palm:% qsub -I -W group_list=k1234 -q general -l walltime=06:00:00,ncpus=16

When the requested processors are available, the interactive prompt will appear:% cd $PBS_O_WORKDIR

% mpirun -np 16 ./mpihello

% exit