Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking...

13
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D. Colling, B. MacEvoy, S. Wakefield, Y. Zhang. Imperial College London

Transcript of Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking...

Stuart Wakefield Imperial College London

Evolution of BOSS, a tool for job submission and

trackingW. Bacchi, G. Codispoti, C. Grandi, INFN Bologna

D. Colling, B. MacEvoy, S. Wakefield, Y. Zhang. Imperial College London

Stuart Wakefield Imperial College London

Outline

• Introduction to BOSS.

• Previous features and usage.

• New functionality.

• Reengineering of the design.

• Current status and plans.

Stuart Wakefield Imperial College London

Introduction

• Batch Object Submission System. • See Previous talk at CHEP03, monitoring track, THET001.• A tool for batch job submission, real time monitoring and

book keeping.• Interfaced to many schedulers both local and grid.• Utilizes relational database for persistency.• Full logging and bookkeeping information stored.• Job commands: submit, kill, query and output retrieval.• Can define custom job types which allows specify

monitoring unique to the submitted application.

Stuart Wakefield Imperial College London

BOSS in CMS computing• Used in CMS MC

production for 4 years.

• Prototype CMS distributed analysis system (GROSS) based on BOSS and later new analysis system using BOSS.

• Last year it was decided that the BOSS architecture needed to be redesigned in order to meet the changing requirements of CMS computing.

BOSS

Logging &bookkeeping

monitoring

Pro

du

ctio

n /

anal

ysis

to

ol

Stuart Wakefield Imperial College London

V3.x workflow I

boss submitboss queryboss kill BOSS

DB

BOSS Schedulerfarm node

farm node

Wrapper

• User specifies job - parameters including:– Executable name.– Executable type - turn on customized monitoring.– Output files to retrieve (for sites without shared file system and grid).

• User tells Boss to submit jobs specifying scheduler i.e. PBS, LSF, SGE, Condor, LCG, GLite etc..

• Job consists of job wrapper, Real time monitoring service and users executable.

Stuart Wakefield Imperial College London

V3.x workflow II

•Once running wrapper starts real time monitoring services and users executable.•Writes all logging information (start time, finish time, exit code etc.) to local journal file.•Monitoring services parse job output looking for regular expressions specified by the job type.•Monitoring info saved to journal file and returned to the user via a database connection to the BOSS DB or via R-GMA (if possible).

ou

tpu

t

#!/usr/bin/perl$i = 0;while($i<3){ sleep(1); $i++; print "counter $i\n";}

Use

r jo

b

testJOBID COUNTER 12345 0

BO

SS

DB

#!/usr/bin/perlwhile(<STDIN>){ if($_=~/.*counter\s+(\d+).*/){ print “COUNTER=$1\n"; }}

BOSSjobExecutor

counter 1

counter 2

counter 3 COUNTER=1COUNTER=2COUNTER=3

123F

ilte

r

jou

rnal 1234 test counter 1

1234 test counter 21234 test counter 3

BOSSdbUpdator

Stuart Wakefield Imperial College London

V3.x workflow III

• Using BOSS user can get status of jobs, pulling in information from BOSS DB, scheduler and Real-time Monitoring DB.

• When job finished output automatically stored at final destination if possible (i.e. shared file system on local cluster) if not (i.e, LCG) output must be fetched by separate BOSS command.

• If Real Time monitoring not available (i.e. firewall) BOSS DB can be updated from journal file.

% boss q -all -specific -type test

ID S_USR EXECUTABLE ST EXE_HOST START TIME STOP TIME counter

1 grandi test.pl 15 E pccms10.bo 14:30:00 06/06 14:30:16 06/06 3

2 grandi test.pl 15 R pccms10.bo 14:30:02 06/06 -------------- 2

Stuart Wakefield Imperial College London

Proposed changes• Following experience from CMS MC and distributed analysis

systems it was decided to re-engineer BOSS.• Provide a C++ and Python API (via SWIG) to allow higher level

tools to steer BOSS.• Introduce task, chain and program.

– Program is the users executable.– Chain is an arbitrarily complex set of different programs run on

the same worker node.– Task is a group of homogeneous jobs that may be executed in

parallel.• In order to describe new task hierarchy move to xml task

descriptions.• Separate bookkeeping from real time monitoring.• Improve real time monitoring but leave as optional.• Allow multiple real time monitoring mechanisms.• Allow pluggable chaining tools i.e. ShReek (CHEP06 id 276).

Stuart Wakefield Imperial College London

Logging and Monitoring

• Separate users logging and (optional) monitoring DB’s.• Only allow access to logging DB via BOSS tools. i.e.

remove all server requirements (allows personal db implementation in SQLite on local disk).

• Fill logging database with BOSS tools from information in monitoring DB and journal file retrieved at end of job.

• Real time server updated by updater on worker node. Transport mechanism possibly utilizing a proxy server.

• Real time update mechanism possible implementations R-GMA, MonaLisa etc…

• Allow for different RT mechanisms for each job.• Information in monitoring database expires.

Stuart Wakefield Imperial College London

New data flow

User Interface

BOSS CLIENT

LOCAL OR GRID

SCHEDULER

REAL-TIME BOSS

DB SERVER

Worker Node

BOSS JOB WRAPPER

USER PROCESS

BOSS REAL-TIME

UPDATER

Submit or control job

Get job running status

Pop job monitoring info

Job control and logging File I/O control

Set job logging info (possibly via proxy)

Retrieve output files

BOSS JOURNAL

BOSS DB

Stuart Wakefield Imperial College London

New job wrapper• Job wrapper will startchainer and monitoring modules.• Job chainer will launch each executableseparately within its own environment.• Job wrapper will provide 2 levels of monitoring, job and executable level.

– Job level monitoring includes overall variables such as total time, total memory usage etc.. – Executable monitoring will monitor the executables progress and journal.

• Future plans include allowing action to be taken if certain circumstances are met - i.e running out of memory, detecting infinite loops etc.

Chain

Journal

Task

stdin

user exec

runtime- filter

pre- filter

post- filter stdout stderr

TaskExecutor

Task

stdin

user exec

runtime- filter

pre- filter

post- filter stdout stderr

TaskExecutor

Program

stdin

user exec

runtime- filter

pre- filter

post- filter stdout stderr

ProgramExecuter

JobMonitor (real-time updater)

JobChaining

JobExecuter (wrapper)

Stuart Wakefield Imperial College London

Sample Task specification<?xml version="1.0" encoding="UTF-8" standalone=”yes"?>

<task>

<iterator name=“ITR” start=“0” end=“100” step=“1”>

<chain scheduler="glite” rtupdater="mysql" ch_tool_name="jobExecutor">

<program exec="test.pl"

args=”ITR"

stderr="err_ITR”

program_type="test”

stdin="in”

stdout="out_ITR"

infiles="Examples/test.pl,Examples/in”

outfiles="out_ITR,err_ITR”

outtopdir="" />

</chain>

</iterator>

</task>•Example of task containing 100 chains each consisting of 1 program.•Program specific monitoring activated - results returned via MySQL connection.

Stuart Wakefield Imperial College London

Status and plans• Significant new functionality identified and being actively integrated into

BOSS.• Latest release v3.6 includes much of the new functionality:

– Tasks, job and executables.– XML task description.– C++ and Python API’s– Basic executable chaining - currently only default chainer with linear chaining.– Separate logging and monitoring DB’s.– Implemented DB’s in either MySQL or SQLite (more to come).– Optional RT monitoring with multiple implementations, currently only MonaLisa

and direct MySQL connections (to be deprecated).

• Still to be done:– Allow chainer plugins.– Implement more RT monitoring solutions i.e R-GMA.– Finalize API.– Look at writing wrapper in scripting language i.e Perl/Python.