Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling...
Transcript of Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling...
![Page 1: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/1.jpg)
18 – 22 May 2015 1May 18-22 2015 1
Resource Management and
Job Scheduling
Jenett TillotsonSenior Cluster System Administrator
Indiana University
![Page 2: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/2.jpg)
18 – 22 May 2015 2May 18-22 2015 2
Resource Managers Keep track of resources
Nodes: CPUs, disk, memory, swap, load, etc. Network, licenses, storage, etc.
Keep track of requests Jobs, queues, etc.
Control jobs which use these resources Stop, hold, cancel, monitor, etc.
![Page 3: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/3.jpg)
18 – 22 May 2015 3May 18-22 2015 3
Job Scheduler What jobs run on what resources Pretty complicated
Quality of Service/Service Level Agreements Avoid job starvation Job placement
Maximize good stuff Minimize bad stuff
![Page 4: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/4.jpg)
18 – 22 May 2015 4May 18-22 2015 4
TORQUE Manager● Terascale Open-source Resource and QUEue Portable Batch System (PBS), NASA, 1991 OpenPBS, open source, 1998 PBSPro, commercial product TORQUE, open source, 2003
Hosted and developed by Adaptive Computing
![Page 5: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/5.jpg)
18 – 22 May 2015 5May 18-22 2015 5
Moab Maui, mid 1990s, open sourced 2000 Moab, commercial product, 2001 Dave Jackson, creator of Maui/Moab
Started Cluster Resources Now Adaptive Computing
![Page 6: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/6.jpg)
18 – 22 May 2015 6May 18-22 2015 6
Torque Topology Diagram
![Page 7: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/7.jpg)
18 – 22 May 2015 7May 18-22 2015 7
Master Node pbs_server Provides
Node tracking Queues and queuing policies Storage for job scripts and tracking of jobs Usage and events logs
pbs_sched: FIFO scheduler
![Page 8: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/8.jpg)
18 – 22 May 2015 8May 18-22 2015 8
Compute Nodes pbs_mom: Machine Oriented Mini-server
Starts the job on the compute resources Monitors resource utilizations Notifies pbs_server of job events Facilitates multi-node jobs Spools stdout and stderr
Mother Superior and sister MOMs
![Page 9: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/9.jpg)
18 – 22 May 2015 9May 18-22 2015 9
Submit Nodes TORQUE client
qsub, qdel, qhold/qrls, qstat, qalter
trqauthd: TORQUE Authorization Daemon Runs on all nodes
All nodes
![Page 10: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/10.jpg)
18 – 22 May 2015 10May 18-22 2015 10
Job Flow
![Page 11: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/11.jpg)
18 – 22 May 2015 11May 18-22 2015 11
Job Flow
![Page 12: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/12.jpg)
18 – 22 May 2015 12May 18-22 2015 12
Job Flow
![Page 13: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/13.jpg)
18 – 22 May 2015 13May 18-22 2015 13
Job Flow
![Page 14: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/14.jpg)
18 – 22 May 2015 14May 18-22 2015 14
Job Flow
![Page 15: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/15.jpg)
18 – 22 May 2015 15May 18-22 2015 15
Installation Requires libxml2-devel, openssl-devel, Tcl/Tk
for the (optional) GUI, libhwloc for (optional) cpusets, gcc, gcc-c++, make, libtool, boost-devel
configure; make; make install make install_mom, make install_client, make
install_server make rpm -or- make packages
![Page 16: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/16.jpg)
18 – 22 May 2015 16May 18-22 2015 16
Configuring TORQUE ./configure options:
--prefix=/usr/local/
--home_server_home=/var/spool/torque/
--with-default-server=$HOSTNAME pbs_server: /var/spool/torque/server_priv/nodes pbs_mom: /var/spool/torque/mom_priv/config /var/spool/torque/server_name
![Page 17: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/17.jpg)
18 – 22 May 2015 17May 18-22 2015 17
/var/spool/torque/server_priv/nodes:
node1 np=16 prop1 prop2
node2 np=16 prop1
node3 np=32 prop3 prop2
node4 np=16 prop1 prop2
![Page 18: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/18.jpg)
18 – 22 May 2015 18May 18-22 2015 18
/var/spool/torque/mom_priv/config:
$loglevel 3
$spool_as_final_name true
$usecp *:/N/home /N/home
$usecp *:/N/dc2 /N/dc2
![Page 19: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/19.jpg)
18 – 22 May 2015 19May 18-22 2015 19
/var/spool/torque/server_name:
myresmgr.domain.edu
![Page 20: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/20.jpg)
18 – 22 May 2015 20May 18-22 2015 20
Running TORQUE Startup the first time:
pbs_server -t create pbs_mom, trqauthd
Startup scripts are in $BUILD_DIR/contrib/ Testing
pbsnodes qmgr /var/spool/torque/server_logs /var/spool/torque/mom_logs
![Page 21: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/21.jpg)
18 – 22 May 2015 21May 18-22 2015 21
Security Compute nodes and submit hosts must be able
to talk to port 15001 on the pbs_server pbs_server must be able to talk to port 15002
on the compute nodes The compute nodes must be able to talk to port
15003 on the compute nodes
![Page 22: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/22.jpg)
18 – 22 May 2015 22May 18-22 2015 22
TORQUE Configuration - qmgrcreate queue foo
set queue foo queue_type = Execution
set queue foo resources_max.nodes = 32
set queue foo resources_max.walltime = 24:00:00
set queue foo resources_default.nodes = 1
set queue foo resources_default.walltime = 1:00:00
set queue foo enabled = True
set queue foo started = True
![Page 23: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/23.jpg)
18 – 22 May 2015 23May 18-22 2015 23
TORQUE Configuration (cont.)
set server scheduling = True
set server acl_host_enable = True
set server acl_hosts = myresmgr
set server managers = root@myresmgr
set server operators = root@myresmgr
set server submit_hosts = mysubmithost
![Page 24: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/24.jpg)
18 – 22 May 2015 24May 18-22 2015 24
TORQUE Configuration (cont.)set server default_queue = foo
set server log_events = 511
set server mail_from = adm
set server node_check_rate = 150
set server tcp_timeout = 6
![Page 25: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/25.jpg)
18 – 22 May 2015 25May 18-22 2015 25
Torque Commands qstat: Used to query the resource manager.
Common usage: “qstat -f $JOBID”: displays full info for $JOBID. “qstat -a”: displays all jobs. “qstat -q”: displays queues status. “qstat -Qf”: display queue definitions.
![Page 26: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/26.jpg)
18 – 22 May 2015 26May 18-22 2015 26
TORQUE Commands (cont.) pbsnodes: command used to query the state of
nodes, mark a node offline, or online.– “pbsnodes -o $NODE”: sets the $NODE offline
– “pbsnodes -r $NODE”: clears the offline state
– “pbsnodes -l”: lists all nodes that are down or offline
– “pbsnodes -l $STATE”: lists all node in state $STATE
![Page 27: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/27.jpg)
18 – 22 May 2015 27May 18-22 2015 27
Job Script#!/bin/bash
#PBS -l nodes=2:ppn=16
#PBS -l walltime=2:00:00
#PBS -N myjobname
#PBS -m bea
#PBS -M [email protected]
#PBS -j oe
#PBS -k o
#PBS -V
#PBS -q foo
cd $PBS_O_WORKDIR
./runmyjob
![Page 28: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/28.jpg)
18 – 22 May 2015 28May 18-22 2015 28
TORQUE Directives● -l resource requests● -N job name● -m when to mail (b: start, e: end, a: abort, n: none)● -M where to mail● -j join output streams● -k keep output stream● -V copy submission environment to compute node● -q queue to submit to
![Page 29: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/29.jpg)
18 – 22 May 2015 29May 18-22 2015 29
Job Environment Variables PBS_O_HOST - The machine that submitted the job. PBS_O_LOGNAME - The user who submitted the job. PBS_O_HOME - The home directory of the user who submitted the job. PBS_O_WORKDIR - The working directory from where the qsub was run. PBS_ENVIRONMENT - Set to PBS_BATCH for batch jobs and to
PBS_INTERACTIVE for interactive jobs. PBS_O_QUEUE - The original queue to which the job was submitted. PBS_JOBID - The identifier that PBS assigns to the job. PBS_JOBNAME - The name of the job. PBS_NODEFILE - The file which contains the list of nodes assigned to
the job.
![Page 30: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/30.jpg)
18 – 22 May 2015 30May 18-22 2015 30
Job Control qsub – submit a job to the queues qdel – delete a job from the queues qhold – put a job on hold qrls – release a hold qstat – job status qalter – alter the attributes of an idle job
![Page 31: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/31.jpg)
18 – 22 May 2015 31May 18-22 2015 31
Submitting a Job qsub -I
Submits an interactive job qsub $JOB_SCRIPT_FILE qsub -l nodes=1:ppn=16 -l walltime=2:00:00 -q
foo -N myname $JOB_SCRIPT_FILE Directives on the command line will override the
directives in the job script Jobs spooled in /var/spool/torque/server_priv/jobs
![Page 32: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/32.jpg)
18 – 22 May 2015 32May 18-22 2015 32
Job Scheduling “pbs_sched”: Simple FIFO scheduler “qrun”
Terminating TORQUE “qterm -t quick”: Leave jobs running “qterm -t immediate”: Terminate all jobs as well
![Page 33: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/33.jpg)
18 – 22 May 2015 33May 18-22 2015 33
Troubleshooting “tracejob -n $NUMB_OF_DAYS $JOB_ID”
Logs /var/spool/torque/server_logs /var/spool/torque/mom_logs /var/spool/torque/client_logs /var/spool/torque/server_priv/accounting /var/spool/torque/job_logs
![Page 34: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/34.jpg)
18 – 22 May 2015 34May 18-22 2015 34
Moab Workload Manager
![Page 35: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/35.jpg)
18 – 22 May 2015 35May 18-22 2015 35
Installation Download from Adaptive Computing libcurl, perl, perl-cpan, libxml2-devel, torque configure; make; make install Configure options
--prefix=/opt/moab
--with_homedir=/opt/moab
--with-serverhost=$HOSTNAME
--with-torque=/usr/local
![Page 36: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/36.jpg)
18 – 22 May 2015 36May 18-22 2015 36
moab.cfgSCHEDCFG[mysched] SERVER=mysched:42559
ADMINCFG[1] USERS=root
ADMINCFG[3] USERS=all
RMCFG[myresmgr] TYPE=PBS
RMCFG[myresmgr] SUBMITCMD=/usr/local/bin/qsub
RMCFG[myresmgr] TIMEOUT=00:05:00
![Page 37: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/37.jpg)
18 – 22 May 2015 37May 18-22 2015 37
moab.cfgLOGLEVEL 3
LOGFILEMAXSIZE 100000000
LOGFILEROLLDEPTH 10
RMPOLLINTERVAL 15
DISABLESCHEDULING TRUE
![Page 38: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/38.jpg)
18 – 22 May 2015 38May 18-22 2015 38
moab.cfgJOBNODEMATCHPOLICY EXACTNODE
NODEALLOCATIONPOLICY PRIORITY
NODEACCESSPOLICY SINGLEJOB
JOBREJECTPOLICY HOLD
DEFERTIME 00:15:00
DEFERCOUNT 5
JOBACTIONONNODEFAILURE REQUEUE
![Page 39: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/39.jpg)
18 – 22 May 2015 39May 18-22 2015 39
moab.cfgPROCWEIGHT 10
XFACTORWEIGHT 1000
FSWEIGHT 3
FSUSERWEIGHT 1000
FSPOLICY DEDICATEDPS
FSDEPTH 7
FSINTERVAL 24:00:00
FSDECAY 0.80
![Page 40: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/40.jpg)
18 – 22 May 2015 40May 18-22 2015 40
moab.cfgRESERVATIONPOLICY CURRENTHIGHEST
RESERVATIONDEPTH 10
BACKFILLPOLICY FIRSTFIT
![Page 41: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/41.jpg)
18 – 22 May 2015 41May 18-22 2015 41
moab.cfg
USERCFG[DEFAULT] FSTARGET=10.0
USERCFG[DEFAULT] MAXIJOBS=16
CLASSCFG[foo] HOSTLIST=node1[0-9]$
CLASSCFG[foo] MAXNODEPERUSER=4
CLASSCFG[foo] MAXJOB[USER]=1
NODECFG[DEFAULT] PRIORITYF=-LOAD
![Page 42: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/42.jpg)
18 – 22 May 2015 42May 18-22 2015 42
Running moab “mdiag -C”: Will check moab.cfg for errors “/opt/moab/sbin/moab” Startup scripts are in $BUILD_DIR/contrib
![Page 43: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/43.jpg)
18 – 22 May 2015 43May 18-22 2015 43
Troubleshooting “mdiag -R”: Shows what moab thinks is the status of
the resource manager “showq”: shows jobs in the Running, Idle, and Blocked
moab queues “checkjob -v $JOB_ID” “checknode $NODE_ID” “showstart $JOB_ID” Logs are in /opt/moab/log
![Page 44: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/44.jpg)
18 – 22 May 2015 44May 18-22 2015 44
Controlling moab “mschedctl -p”: Pauses moab “mschedctl -r”: Starts moab “mschedctl -R”: Re-reads moab.cfg “mschedctl -k”: Kill moab “mschedctl -L 7”: Sets log level
![Page 45: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/45.jpg)
18 – 22 May 2015 45May 18-22 2015 45
Moab Client● Installed just like on the server● Requires just the following line in moab.cfg:
SCHEDCFG[mysched] SERVER=mysched:42559● msub, mjobctl – submit and control jobs through moab
instead of the resource manager
● ADMINCFG[3] users allowed to run query commands (checknode, checkjob, etc.)
![Page 46: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/46.jpg)
18 – 22 May 2015 46May 18-22 2015 46
Examples
![Page 47: Resource Management and Job Scheduling · 2016-08-09 · Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University. 18 ... 18](https://reader033.fdocuments.us/reader033/viewer/2022042111/5e8cb5d4c884d1338f40d4bc/html5/thumbnails/47.jpg)
18 – 22 May 2015 47May 18-22 2015 47
External Resources● Moab Information, Download, and Docs:
http://www.adaptivecomputing.com/products/hpc-products/moab-hpc-suite-basic-edition/
● Torque Information, Download, Docs and User Community Lists:
http://www.adaptivecomputing.com/products/open-source/torque