Never Lose a SAS Job

21
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Never Lose a SAS Job

description

Never Lose a SAS Job. Not Again!!. Unexpected re-boot, system failures Long running job didn’t complete Must manually re-start job from step 1. It can drive you crazy!!!. SAS Grid Gets the Stars Aligned. SAS checkpoint-restart features +LSF requeue capabilities - PowerPoint PPT Presentation

Transcript of Never Lose a SAS Job

Page 1: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Never Lose a SAS Job

Page 2: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Not Again!!

Unexpected re-boot, system failures

Long running job didn’t complete

Must manually re-start job from step 1

It can drive you crazy!!!

Page 3: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SAS Grid Gets the Stars Aligned...

SAS checkpoint-restart features

+ LSF requeue capabilities

+ SASGSUB batch submission utility

---------------------------------------------------

Completion of SAS Jobs in Minimal Time

Ideal for critical long-running SAS jobs

Page 4: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SAS Checkpoint/Restart

Checkpoint mode• Record info about data/proc steps in checkpoint library

Restart mode• Global statements and macros re-executed• SAS reads data in checkpoint library to determine

which steps completed• Program execution resumes with step that was

executing when failure occurred• Data/proc steps that completed successfully will not be

re-executed

Page 5: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

To Set Up for Checkpoint-Restart

Specify following options on batch SAS invocation:• STEPCHKPT – enables checkpoint mode• STEPRESTART – causes SAS to use checkpoint-restart data• NOWORKINIT – does not init WORK library when SAS starts• NOWORKTERM – saves WORK library when SAS exits• ERRORCHECK STRICT – puts SAS in syntax check mode

when error in libname, filename, %include and lock stmts• ERRORABEND – causes SAS to terminate for most errors

Page 6: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

The WORK Directory

WORK is default location for checkpoint library• Can use STEPCHKPTLIB to point to permanent library• Must include libname as first statement in batch program

WORK directory must be on shared storage

Example:• sas92 -noworkinit -noworkterm -work abc

Page 7: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Use of Both STEPCHKPT and STEPRESTART

Initial invocation • Results in checkpoint mode only• No data in checkpoint library

Subsequent invocations• Uses data from checkpoint library• Continues checkpoint mode for remainder of program

Page 8: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SAS Application SAS Grid

Manager

HOST A

HOST B

HOST C

Normal Queue

SAS Grid Manager – Queues

Page 9: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Automatic Job Requeue

Configure queue to automatically requeue job with specific exit value• REQUEUE_EXIT_VALUES=all ~0 ~1

− Any exit code other than 0 or 1 (success & warnings) will be requeued

• REQUEUE_EXIT_VALUES=EXCLUDE(all ~0 ~1)

− Run requeued job on different host• Jobs requeued 5 times by default

− MAX_JOB_REQUEUE lets you configure requeue limit, can be globally specified for all queue or on per queue basis

Page 10: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Automatic Job Rerun

A job is automatically rerun when• Execution host becomes unavailable while a job is

running• System fails while a job is running• RERUNNABLE=yes

Page 11: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

LSF Queue Definition

Begin Queue QUEUE_NAME   = sas_rerun PRIORITY     = 40 NICE         = 10 RERUNNABLE   = YES REQUEUE_EXIT_VALUES = all ~0 ~1DESCRIPTION  = Jobs submitted to this queue will be requeued automatically and also rerunnable. End Queue

Jobs dispatched from this queue will be rerun if system failures

Jobs with fatal exit code will be

requeued

Page 12: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SASGSUB Capabilities

Standalone utility that will allow user to• Submit SAS program to grid for processing• Display status of user’s jobs on the grid• Retrieve output from user’s jobs to local directory• Kill jobs

Page 13: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Using SASGSUB

Advantages• Submit and forget • View job output while job is running• Eliminate need for full SAS install on client• Make use of SAS checkpoint/restart capability

NOTE - requires shared file system between client and grid

Page 14: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Submitting a Job

Command line interface• sasgsub –gridsubmitpgm <sas_pgm>

Example output

Job ID: 6772Job directory: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm"Job log file: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm/testPgm.log“

Page 15: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Submitting a Job for Checkpoint-Restart

GRIDRESTARTOK• Automatically adds the following options to batch SAS invocation

− STEPCHKPT, STEPRESTART, ERRORCHECK STRICT, ERRORABEND, NOWORKINIT, NOWORKTERM

• Sets RERUNNABLE parm on job

Command line interface• sasgsub –gridsubmitpgm <sas_pgm> -gridrestartok

Page 16: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Getting Job Status

Current Job Information Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started:

08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57 Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started:

08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57 Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:28:57

Command line interface• sasgsub –gridgetstatus <job_id | _ALL_>

Example output

Page 17: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Retrieving Results

Command line interface• sasgsub –gridgetresults <job_id | _ALL_>

Example Output

Current Job Information Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started:

08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33 Moved job information to .\SASGSUB-2008-11-21_21.52.57.130_testPgm

Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started: 08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33

Moved job information to .\SASGSUB-2008-11-24_13.13.39.167_testPgm

Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:53:34

Page 18: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SAS Application SAS Grid

Manager

HOST A

HOST B

HOST C

normal queue

Putting It All Together

sas_rerun queue

Page 19: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SAS Application SAS Grid

Manager

HOST A

HOST B

HOST C

normal queue

Putting It All Together

sas_rerun queue

Page 20: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Author contact information second line

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Page 21: Never Lose a SAS Job

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

A simple solution

Record a checkpoint number, save it in WORK

If restarting, skip PROC / DATA steps to there

Tokenize everything

Execute all global statements