N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Using the Batch System1 Using the Batch...

24
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER Using the Batch System 1 Using the Batch System at NERSC Mark Durst NERSC/USG ERSUG Training, Argonne, IL 28 April 1999

Transcript of N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Using the Batch System1 Using the Batch...

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 1

Using the Batch Systemat NERSC

Mark Durst

NERSC/USG

ERSUG Training, Argonne, IL

28 April 1999

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 2

Outline

• Quick example

• How batch processing works

• Batch and pipe queues

• How to submit jobs

• Monitoring jobs

• Reminders and Pointers

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 3

#!/bin/csh## file: simple1# #QSUB -q serial#QSUB -J y # keep job log

set myname=`whoami`set now=`date`set mylocn=`pwd`

echo ""echo "Hello $myname, this is your shell script $0,"echo "running at $now."echo ""echo "Your current directory is $mylocn, which should"echo "be the same as $HOME."echo ""echo "I'm going to sleep now."echo ""

sleep 90

exit

% cqsub simple1Task id t51847 inserted into database nqedb.

% cqstatl t51847-----------------------------NQE 3.3.0.9 Database Task Summary-----------------------------IDENTIFIER NAME SYSTEM-OWNER OWNER LOCATION ST-------------------- ------- ---------------- -------- ------------------- ----t51847 simple1 scheduler.main mjdurst NQE Database NNew

% cqstatl t51847-----------------------------NQE 3.3.0.9 Database Task Summary-----------------------------IDENTIFIER NAME SYSTEM-OWNER OWNER LOCATION ST-------------------- ------- ---------------- -------- ------------------- ----t51847 simple1 scheduler.main mjdurst NQE Database NPend

% cqstatl t51847-----------------------------NQE 3.3.0.9 Database Task Summary-----------------------------IDENTIFIER NAME SYSTEM-OWNER OWNER LOCATION ST-------------------- ------- ---------------- -------- ------------------- ----t51847 simple1 lws.mcurie mjdurst NQE Database NSche

% cqstatl t51847-----------------------------NQE 3.3.0.9 Database Task Summary-----------------------------IDENTIFIER NAME SYSTEM-OWNER OWNER LOCATION ST-------------------- ------- ---------------- -------- ------------------- ----t51847 (49939.mcurie) simple1 lws.mcurie mjdurst nqs@mcurie NSubm

% qstat 49939---------------------------------NQS 3.3.0.9 BATCH REQUEST SUMMARY---------------------------------IDENTIFIER NAME USER LOCATION/QUEUE JID PRTY REQMEM REQTIM ST------------- ------- -------- --------------------- ---- ---- ------ ------ ---49939.mcurie simple1 mjdurst serial_short@mcurie 3753 25 364 1800 R03

% qstat 49939nqs-100 qstat: CAUTION Request <49939>: not found.

% cqstatl t51847-----------------------------NQE 3.3.0.9 Database Task Summary-----------------------------IDENTIFIER NAME SYSTEM-OWNER OWNER LOCATION ST-------------------- ------- ---------------- -------- ------------------- ----t51847 (49939.mcurie) simple1 monitor.main mjdurst NQE Database NComp

% ls -ltotal 12-rwxrw-r-- 1 mjdurst mpccc 365 Jan 15 10:47 simple1*-rw-r--r-- 1 mjdurst mpccc 0 Jan 15 10:50 simple1.e51847-rw-r--r-- 1 mjdurst mpccc 1285 Jan 15 10:50 simple1.l51847-rw-r--r-- 1 mjdurst mpccc 2638 Jan 15 10:50 simple1.o51847

% cat simple1.l5184701/15 10:48:13 Submitting to queue <serial> by <mjdurst(12113)>01/15 10:48:13 Command line options: <-e /u1/mjdurst/tests/bat.simple/simple1.e51847 -J y -j /u1/mjdurst/tests/bat.simple/simple1.l51847 -lM 28mw 28mw -lT 1800 1800 -mu mjdurst@mcurie -o /u1/mjdurst/tests/bat.simple/simple1.o51847 -r simple1 -x -q serial>.01/15 10:48:13 Script file options: <-q serial -J y # keep job log>.01/15 10:48:15 Arrived in <serial@mcurie> from <mcurie>.01/15 10:48:15 Request-id is <49939.mcurie>, Request name=<simple1>.01/15 10:48:15 NQE Task ID is <nqedb.t51847>.01/15 10:48:15 Origin uid=<12113>, Target username=<mjdurst>.01/15 10:48:15 Account/Project name=<mpccc>, Account/Project ID=<105>.01/15 10:48:15 Submission security level=<0>, compartments=<0>.01/15 10:48:17 Account/Project name=<mpccc>, Account/Project ID=<105>.01/15 10:48:17 Arrived in <serial_short@mcurie> from <serial@mcurie>.01/15 10:48:20 Submission security level=<0>, compartments=<0>.01/15 10:48:20 Execution security level=<0>, compartments=<0>.01/15 10:48:23 Started, pid=<36967>, jid=<3753>, shell=</bin/csh>, umask=<18>.01/15 10:48:23 Running in queue <serial_short>.01/15 10:50:02 Finished.01/15 10:50:02 Returning stderr output file.01/15 10:50:03 Returning stdout output file.

% cat simple1.o51847 mcurie.nersc.gov, a Cray T3E-900 running UNICOS/mk 2.0.3.32 ------------------------------Contact Information------------------------------ NERSC Web http://www.nersc.gov/ ESnet Web http://www.es.net/ ESCHER Web http://www.nersc.gov/hardware/servers/vis-server.html

<snip> CFS CONVERSION CFS to HPSS conversion was successfully completed on January 7, 1999. Users can access all of their CFS files on the new HPSS system, "archive". The cfs command on the NERSC Crays now points to the new HPSS interface, hsi. For moreinfo on using hsi reference this URL:

http://www.nersc.gov/hardware/storage/hsi.ch1.html.

If your HPSS password fails or you don't have an HPSS account, contact the Account Support group at 1-800-66NERSC, option 2, or (510) 486-8612 ------------------------------------------------------------------------------

Your current working directory is /u/mpccc/mjdurst.

Hello mjdurst, this is your shell script /usr/spool/nqe/spool/scripts/++BBI+++++0+++,running at Fri Jan 15 10:48:31 PST 1999.

Your current directory is /u1/mjdurst, which shouldbe the same as /u/mpccc/mjdurst.

I'm going to sleep now.

logout

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 8

Why Batch Processing?

• Batch queues are necessary:– On systems with many jobs

– When scheduling is difficult

– To assure greater throughput

• Interactive jobs are limited– J90: 10 hrs.

– T3E: < 64 PEs, < 30 minutes parallel (1 hr serial)

• Some machines/processors batch-only – J90: all batch machines

– T3E: many APP PEs (at night, almost all)

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 9

The Batch Process

• User creates shell script myscript• Submits to NQE with cqsub myscript

– Returns NQE task id (e.g., t4913)

• NQE forwards to NQS– J90: selects a machine (J90 wait time here)

• NQS runs the job– Assign NQS job id (e.g., 6859.mcurie)

– Select a batch queue

– Place the job there (T3E wait time here)

– Run it when appropriate

• NQS/NQE returns job logs at completion

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 10

Pipe Queues

• Groups of batch queues– Direct to a pipe with #QSUB -q serial– Default is production

• To see them: qstat -p

• T3E:– serial,debug, production,long

• J90:– production

– batchk (for evening, weekend killeen queues)

– batch{b,f,s,c,j} (not recommended)

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 11

Preparing for Batch Submission

• Write your shell script– C shell or Bourne/Korn shell

– Starts in user’s home directory

• Debug interactively (if possible)

• Decide on needed resources– J90: CPU time, memory

– T3E: amount of parallel, serial time; number of PEs

• Select other #QSUB options

• Check for appropriate queue and submit

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 12

Essential options to cqsub (#QSUB directives)

• J90:– -lM <mem> – -lT <time>

• T3E:– -l mpp_p <num> – -l mpp_t <par_time>– -lT <ser_time>– don’t use -lM

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 13

Other cqsub options

• -J y : save job log (recommended)

• -j <file>: save it in file

• -mb: send mail when job starts (-me: ends)

• -a <time>: hold job until after time

• -o <file>: put standard output in file • default name: <batfile>.o<id>)

• -eo: combine standard error and output• makes output look like terminal record

• -x: exports user’s environment to job

• -s <shell>: specify shell

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 14

Job Submission

• cqsub <file>

• Can give options at submission time– Override file options

– Less dependable

• If no file name, expects commands from terminal– Useful behavior in automated script generation & submission

• Response:Task id t16839 inserted into database nqedb.

– Task id useful for tracking with cqstatl.

• Don’t break (Ctrl-C) out of cqsub!– Instead, allow to finish, then use cqdel

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 15

Monitoring Jobs

• cqstatl <taskid>– cqstatl -a | grep <username> (if no <taskid>)

• ST column (“status”) indicates progress– NNew, NPend, NSche: still in NQE

– NSubm: submitted to NQS

– NComp: done

– NTerm: killed

– NFail: job failed (user or system error)

• IDENTIFIER column holds NQS job id(once submitted)

• cqstatl -f <taskid>: details for your job

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 16

Monitoring Jobs (cont’d)

• T3E: qstat <jobid> once your job reaches NQS– cqstatl -d nqs = qstat

– qstat -au <username> (if no <jobid>)

• J90: qstat -h <hostname> <jobid> – Find hostname from NQS id (from cqstatl)

– e.g., 2861.seymour

• ST column (“status”) now indicates– RNN : Running (with NN processes)

– Qxy: waiting in the queue (xy encodes reason)• man qstat to decode

% cqstatl -a-----------------------------NQE 3.3.0.9 Database Task Summary-----------------------------IDENTIFIER NAME SYSTEM-OWNER OWNER LOCATION ST-------------------- ------- ---------------- -------- ------------------- ----t48217 (46356.mcurie) PCM lws.mcurie alewife nqs@mcurie NSubmt48713 (46848.mcurie) third lws.mcurie u6670 nqs@mcurie NSubmt49200 (47518.mcurie) int566A lws.mcurie u61176 nqs@mcurie NSubmt49245 (47368.mcurie) xqcd_ho lws.mcurie snm nqs@mcurie NSubmt50349 (48480.mcurie) int650 lws.mcurie u61176 nqs@mcurie NSubmt50881 (49338.mcurie) lte34-0 lws.mcurie lungfish nqs@mcurie NSubm

<snip> t51870 case17c scheduler.main salmon NQE Database NTermt51871 case1c9 scheduler.main salmon NQE Database NFailt51872 case16c scheduler.main salmon NQE Database NPendt51873 (49967.mcurie) q_lsms lws.mcurie marlin nqs@mcurie NSubmt51875 case11c scheduler.main salmon NQE Database NPendt51877 (49970.mcurie) G08 lws.mcurie u66870 nqs@mcurie NSubmt51878 (49971.mcurie) qHsig.3 lws.mcurie bass nqs@mcurie NSubmt51881 (49975.mcurie) Jobge_b lws.mcurie carp nqs@mcurie NSubmt51884 (49979.mcurie) job16.a lws.mcurie adt nqs@mcurie NSubmt51885 (49980.mcurie) run_dyn lws.mcurie flounder nqs@mcurie NSubmt51886 (49981.mcurie) jupiter lws.mcurie grouper nqs@mcurie NSubmt51887 (49983.mcurie) JobCZ.b lws.mcurie tarpon nqs@mcurie NComp

(output greatly abridged)

% qstat -a---------------------------------NQS 3.3.0.9 BATCH REQUEST SUMMARY---------------------------------IDENTIFIER NAME USER LOCATION/QUEUE JID PRTY REQMEM REQTIM ST------------- ------- -------- --------------------- ---- ---- ------ ------ ---49979.mcurie job16.ag adt pe32@mcurie 4164 25 255 1520 R0349936.mcurie akr520 u6677 pe32@mcurie 3732 25 323 1800 R0349964.mcurie case14c9 salmon pe32@mcurie 3944 25 255 1795 R0349967.mcurie q_lsms marlin pe32@mcurie 999 28672 1800 Cge49983.mcurie JobCZ.bb tarpon pe32@mcurie 317 28672 1800 Qge49984.mcurie bitgc11 u62098 pe32@mcurie 244 28672 1800 Qge49985.mcurie bitgc11 u62098 pe32@mcurie 242 28672 1800 Qge49362.mcurie Job_a2 carp pe128@mcurie 5308 25 323 1800 R0349335.mcurie script.2 sturgeon pe256@mcurie 999 28672 1800 Qqs49033.mcurie uo2_3h2o dorado gc128@mcurie --- 28672 7200 Hop49255.mcurie run010_A bluegill long128@mcurie 4617 25 255 1800 R0349276.mcurie sg3D10 aku long128@mcurie 999 4096 1800 Qce49277.mcurie sg3D10 aku long128@mcurie 999 4096 1800 Qqu49867.mcurie run_t4 flounder long128@mcurie 70 28672 1800 Cggno pipe queue entries

(output greatly abridged)

% qstat -f pe32------------------------------------NQS 3.3.0.9 BATCH QUEUE: pe32@mcurie Status: ENABLED/RUNNING------------------------------------ Priority: 15<ENTRIES> Total: 17

Running: 5 Queued: 12 Waiting: 0 Holding: 0 Arriving: 0 Exiting: 0 <RUN LIMITS> Queue: 13 User: 2 Group: 20 <COMPLEX MEMBERSHIP> regular

<LOCAL SCHEDULER EXTENSIONS> Miser Queue: unspecified Scheduling Window: 0:0.0

<RESOURCE USAGE> LIMIT ALLOCATED Memory Size unlimited 143360kw Quick File Space 0b 0kw MPP Processor Elements 416 60<REQUEST LIMITS> PER-PROCESS PER-REQUEST type a Tape Drives unspecified (0) type b Tape Drives unspecified (0) type c Tape Drives unspecified (0) type d Tape Drives unspecified (0) (cont’d)

type e Tape Drives unspecified (0) type f Tape Drives unspecified (0) type g Tape Drives unspecified (0) type h Tape Drives unspecified (0) Core File Size unspecified (256mw) Data Size unspecified (256mw) Permanent File Space 20gb 25gb Memory Size 28mw 29mw Nice Increment 5 Quick File Space unspecified (0b) 0b Stack Size unspecified (256mw) CPU Time Limit 3600sec 7200sec Temporary File Space unspecified (0b) unspecified (0b) Working Set Limit unspecified (256mw) MPP Processor Elements 32 MPP Time Limit 15000sec 15000sec Shared Memory Limit unspecified (0mw) Shared Memory Segments unspecified (0) MPP Memory Size unspecified (256mw) unlimited

<ACCESS> Route: Pipe Only Users: Unrestricted

<CUMULATIVE TIME> System Time: 3563114615067464.00 secs User Time: 281421545294442428.00 secs

(qstat -f output, cont’d from previous slide)

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 21

Troubleshooting

• No task id returned– Typically means NQE down

– message like “Can’t connect”

• Job doesn’t make it to NQS: try cqstatl <taskid> – NFail usually indicates submission error

– Nabort could be a system problem

– No listing if many days old (NQE database is purged frequently)

• Stuck in NPend status– J90: Many jobs ahead of you?

– T3E: over pipe queue limit?

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 22

Troubleshooting (cont’d)

• Stuck in NSubm : use qstat – Q: normal on T3E, rare on J90

– T3E:• Hop can be allocation problem

• C (“checkpointed”) may be daily shuffling

• May need both pslist and qstat -m to sort it all out

• Job crashes – Read job log, stdout, stderr

– ...limit exceeded: ran out of time (or memory, or…)

• Job vanishes– Did machine(s) crash? If not, collect info and contact Consultants

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 23

Pointers

• Batch job is like a login session– Starts in your home directory

– Uses your startup files

– But doesn’t inherit environment (unless you use -x)

• Environment variable ENVIRONMENT– Not set in interactive work, set to BATCH in batch jobs

– Can exclude parts of startup files

• /usr/tmp faster than home directory– $TMPDIR vanishes (avoids littering)– Just one quota for $TMPDIR , rest of /usr/tmp/

– Can’t monitor batch J90 temp file systems

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Using the Batch System 24

Pointers (cont’d)

• Don’t submit blindly– Debug executables, scripts first

– Don’t trust inherited shell scripts

– Spend time with man pages

• J90: large memory jobs should/must multitask

• T3E: reduce serial time in parallel jobs– “Stage” HPSS retrievals (dmget)

– Submit follow-on serial jobs within your job