WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · −...

146
DataGrid WP1 - WMS S OFTWARE A DMINISTRATOR AND U SER G UIDE Document identifier: DataGrid-01-TEN-0118-1_2 Date: 24/11/2003 Work package: WP1 Partner: Datamat SpA Document status Deliverable identifier: Abstract : This note provides the administrator and user guide for the WP1 WMS software. IST-2000-25182 PUBLIC 1 / 146

Transcript of WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · −...

Page 1: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

DataGr id

W P 1 - W M S S O F T W A R E A D M I N I S T R A T O R A N D U S E R G U I D E

Document identifier: DataGrid-01-TEN-0118-1_2

Date: 24/11/2003

Work package: WP1

Partner: Datamat SpA

Document status

Deliverable identifier:

Abstract: This note provides the administrator and user guide for the WP1 WMS software.

IST-2000-25182 PUBLIC 1 / 146

Page 2: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Delivery Slip

Name Partner Date Signature

From Fabrizio Pacini Datamat SpA 24/11/2003

Verified by Stefano Beco Datamat SpA 24/11/2003

Approved by

Document Log

Issue Date Comment Author

0_0 21/12/2001 First draft Fabrizio Pacini

0_1 14/01/2002 Draft Fabrizio Pacini

0_2 24/01/2002 Draft Fabrizio Pacini

0_3 05/02/2002 Draft Fabrizio Pacini

0_4 15/02/2002 Draft Fabrizio Pacini

0_5 08/04/2002 Draft Fabrizio Pacini

0_6 13/05/2002 Fabrizio Pacini

0_7 19/07/2002 Fabrizio Pacini

0_8 16/09/2002 Fabrizio Pacini

0_9 03/12/2002 Fabrizio Pacini

1_0 13/06/2003 First issue for Release 2.0 Fabrizio Pacini, Massimo Sgaravatto

1_1 04/09/2003 Fabrizio Pacini

1_2 24/11/2003 Fabrizio Pacini

IST-2000-25182 PUBLIC 2 / 146

Page 3: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Document Change Record

Issue Item Reason for Change

0_1 General update

− Take into account changes in the rpm generation procedure.

− Add missing info about daemons (RB/JSS/CondorG) starting accounts

− Some general corrections

0_2 General Update

− Add Cancelling and Cancel Reason information.

− Add OUTPUTREADY job state.

− Add new profile rpms.

− Remove /etc/workload* shell scripts.

− Add summary map table (user / daemon).

− Add CEId format check.

− Add new job cancel notification.

0_3 General Update

− Modified RB/JSS start-up procedure

− Add gridmap-file users/groups issues

− Add proxy certificate usage by daemons

− Job attribute CEId changed to SubmitTo

− Add DGLOG_TIMEOUT setting

− Add workload-profile and userinterface-profile rpms

IST-2000-25182 PUBLIC 3 / 146

Page 4: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Document Change Record

Issue Item Reason for Change

0_4 General Update

− Add configure option –enable-wl for system configuration files

− Add installation checking option –with-globus for Globus to the Workload configure

− Add new Information Index configure options

− Remove edg-profile and edg-user-env rpms from II and UI dependencies

− Add security configuration rpm’s for all the Certificate Authorities to UI dependencies

− Add new parameters to RB configuration file

− Add new Job Exit Code field to the returned job status info

− Remove dependence from SWIG in the userinterface binary rpm

0_5 General Update

− Modify command options syntax (getopt-like style)

− Add MyProxy server and client package installation/utilisation

− Modify job cancel notification

− Add Userguide rpm

0_6 General Update

− Modify configure options for the various components

− UI commands modified to use python2 executable

− Clarify myproxy usage

− Explain how RB/LB addresses in the UI config file are used by the commands

− Add –logfile option to the UI commands

0_7 General Update

− Modify configure options for the various components

− Clarify UI commands –notify option usage

− Add make test target for UI

IST-2000-25182 PUBLIC 4 / 146

Page 5: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Document Change Record

Issue Item Reason for Change

0_8 General Update

− Specified dependencies of profile rpms − Update needed env vars for UI − Explain how to include default constraints in

the job requirements − Explain that the lc field in the ReplicaCatalog

address is now mandatory − Explain how to specify wildcards and special

chars in "Arguments" in the JDL expression

0_9 General Update

− Defaults for Rank and Requirements in the UI config file

− Added reference to the “.BrokerInfo” file document

− other.CEId in Requirements vs --resource option

− Explain MyProxy Server configuration − Added description of new parameters in RB

configuration file − RB/JSS databases clean-up procedure added− Explain usage of RetryCount JDL attribute − Better explain how to specify wildcards and

special chars in "Arguments" in the JDL expression

− Updated reference to JDL Attributes note − Added Annex on Submission failures analysis

1_0 General Update − Refer to WMS release 2

1_1 General Update − Description of new UI commands options for

interactive jobs (--nogui, --nolisten) − Added annexes section on job re-submission

IST-2000-25182 PUBLIC 5 / 146

Page 6: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Document Change Record

Issue Item Reason for Change

1_2 General Update

− Add voms client APIs rpms among WMS components dependencies

− Update commands description due to the integration with VOMS

− Remove proxy credential creation from UI commands

− Remove --hours option from UI edg-job-submit command

Files

Software Products User files

Word 2000 DataGrid-01-TEN-0118-1_2.doc

Acrobat Exchange 5.0 DataGrid-01-TEN-0118-1_2.pdf

IST-2000-25182 PUBLIC 6 / 146

Page 7: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

CONTENT 1. INTRODUCTION ............................................................................................................................... 10 1.1. OBJECTIVES OF THIS DOCUMENT ....................................................................................................... 10 1.2. APPLICATION AREA ........................................................................................................................... 10 1.3. APPLICABLE DOCUMENTS AND REFERENCE DOCUMENTS ..................................................................... 10 1.4. DOCUMENT EVOLUTION PROCEDURE.................................................................................................. 12 1.5. TERMINOLOGY.................................................................................................................................. 12 2. EXECUTIVE SUMMARY................................................................................................................... 14

3. WORKLOAD MANAGEMENT SYSTEM OVERVIEW ..................................................................... 15 3.1. DEPLOYMENT OF THE WMS SOFTWARE............................................................................................. 17 4. INSTALLATION AND CONFIGURATION........................................................................................ 20 4.1. LOGGING AND BOOKKEEPING SERVICES............................................................................................. 20

4.1.1. Required software ............................................................................................................... 20 4.1.1.1. LB local-logger and LB APIs........................................................................................................ 20 4.1.1.2. LB Server..................................................................................................................................... 20

4.1.2. Configuration ....................................................................................................................... 21 4.1.2.1. LB Local-Logger .......................................................................................................................... 22 4.1.2.2. LB Server..................................................................................................................................... 22

4.1.3. Environment Variables ........................................................................................................ 22 4.2. SERVICES RUNNING IN THE “RB NODE”: NS, WM, JC, LM....................................................................... 24

4.2.1. Required software ............................................................................................................... 24 4.2.1.1. Globus installation and configuration........................................................................................... 24

4.2.1.1.1. Condor-G installation and configuration ............................................................................... 24 4.2.1.2. ClassAd installation and configuration......................................................................................... 25 4.2.1.3. Boost installation and configuration ............................................................................................. 25 4.2.1.4. Replica Manager installation and configuration ........................................................................... 25

4.2.2. Configuration ....................................................................................................................... 25 4.2.2.1. Configuration of the “common” attributes .................................................................................... 26 4.2.2.2. NS configuration .......................................................................................................................... 27 4.2.2.3. WM configuration......................................................................................................................... 29 4.2.2.4. JC configuration........................................................................................................................... 31 4.2.2.5. LM configuration .......................................................................................................................... 32

4.2.3. Environment variables ......................................................................................................... 33 4.2.4. Other requirements and configurations for the “RB node” .................................................. 34

4.2.4.1. Customized Gridftp server ........................................................................................................... 34 4.2.4.2. Grid-mapfile ................................................................................................................................. 35 4.2.4.3. Disk Quota................................................................................................................................... 35

4.3. SECURITY SERVICES......................................................................................................................... 36 4.3.1. MyProxy Server ................................................................................................................... 36 4.3.2. Proxy renewal service ......................................................................................................... 37

4.3.2.1. Required software ....................................................................................................................... 37 4.3.2.2. Configuration ............................................................................................................................... 37 4.3.2.3. Environment variables ................................................................................................................. 38

4.4. GRID ACCOUNTING SERVICES ............................................................................................................ 38 4.4.1. Required software ............................................................................................................... 38

4.4.1.1. Creating the MySQL databases for the HLR server .................................................................... 39 4.4.1.2. Creating the MySQL database for the PA server......................................................................... 39

4.4.2. Configuration ....................................................................................................................... 40 4.4.2.1. Configuring the HLR server ......................................................................................................... 40 4.4.2.2. Configuring the PA server ........................................................................................................... 41 4.4.2.3. Configuring the ATM client software............................................................................................ 41

4.4.3. Environment variables ......................................................................................................... 42 4.5. USER INTERFACE.............................................................................................................................. 43

IST-2000-25182 PUBLIC 7 / 146

Page 8: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

4.5.1. Required software ............................................................................................................... 43

4.5.1.1. Python Command Line Interface ................................................................................................. 43 4.5.1.2. C++ API ....................................................................................................................................... 45 4.5.1.3. Java API ...................................................................................................................................... 45 4.5.1.4. Java GUI...................................................................................................................................... 46

4.5.2. RPM installation................................................................................................................... 48 4.5.2.1. Python Command Line Interface ................................................................................................. 48 4.5.2.2. C++ API ....................................................................................................................................... 49 4.5.2.3. Java API ...................................................................................................................................... 49 4.5.2.4. Java GUI...................................................................................................................................... 50

4.5.3. Configuration ....................................................................................................................... 51 4.5.3.1. Python Command Line Interface ................................................................................................. 52 4.5.3.2. Java GUI...................................................................................................................................... 55

4.5.4. Environment variables ......................................................................................................... 58 4.5.4.1. Python Command Line Interface ................................................................................................. 59 4.5.4.2. Java GUI...................................................................................................................................... 59

5. OPERATING THE SYSTEM ............................................................................................................. 60 5.1. LB LOCAL-LOGGER ........................................................................................................................... 60

5.1.1. Starting and stopping daemons........................................................................................... 60 5.1.2. Troubleshooting................................................................................................................... 61

5.2. LB SERVER ...................................................................................................................................... 62 5.2.1. Starting and stopping daemons........................................................................................... 62 5.2.2. Creating custom indices ...................................................................................................... 63 5.2.3. Purging the LB database..................................................................................................... 65 5.2.4. Experimental R-GMA Interface ........................................................................................... 65 5.2.5. Troubleshooting................................................................................................................... 66

5.3. SERVICES RUNNING IN THE “RB NODE”: NS, WM, JC, LM................................................................ 66 5.3.1. Starting and stopping NS, WM, JC and LM daemons......................................................... 66 5.3.2. NS, WM, JC, LM troubleshooting ........................................................................................ 66

5.4. PROXY RENEWAL .............................................................................................................................. 66 5.4.1. Starting and stopping daemon ............................................................................................ 66 5.4.2. Troubleshooting................................................................................................................... 67

5.5. PURGER........................................................................................................................................... 67 5.6. GRID ACCOUNTING ..................................................................................................................... 69

5.6.1. Starting and stopping daemon ............................................................................................ 69 5.6.1.1. HLR server .................................................................................................................................. 69 5.6.1.2. PA Server .................................................................................................................................... 69

5.6.2. HLR server administration................................................................................................... 70 5.6.2.1. Creating a Fund account ............................................................................................................. 71 5.6.2.2. Creating a Group account ........................................................................................................... 72 5.6.2.3. Creating a User account .............................................................................................................. 73 5.6.2.4. Creating a Resource account ...................................................................................................... 74 5.6.2.5. Deleting accounts ........................................................................................................................ 75

5.6.3. Troubleshooting................................................................................................................... 75 5.7. USER INTERFACE (JAVA GUI)............................................................................................................ 75

5.7.1. Troubleshooting................................................................................................................... 76 6. USER GUIDE .................................................................................................................................... 80 6.1. USER INTERFACE .............................................................................................................................. 80

6.1.1. Security................................................................................................................................ 80 6.1.1.1. MyProxy....................................................................................................................................... 81

6.1.1.1.1. MyProxyClient ...................................................................................................................... 81 6.1.2. Common behaviours ........................................................................................................... 83

6.1.2.1. The --input option ........................................................................................................................ 85 6.1.3. Commands description........................................................................................................ 87

6.1.3.1. edg-job-submit............................................................................................................................. 87

IST-2000-25182 PUBLIC 8 / 146

6.1.3.2. edg-job-get-output ......................................................................................................................101

Page 9: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

6.1.3.3. edg-job-list-match .......................................................................................................................105 6.1.3.4. edg-job-cancel ............................................................................................................................110 6.1.3.5. edg-job-status.............................................................................................................................115 6.1.3.6. edg-job-get-logging-info..............................................................................................................121 6.1.3.7. edg-job-attach.............................................................................................................................126 6.1.3.8. edg-job-get-chkpt........................................................................................................................128

7. ANNEXES ....................................................................................................................................... 132 7.1. JDL ATTRIBUTES............................................................................................................................ 132 7.2. JOB STATUS DIAGRAM .................................................................................................................... 132 7.3. JOB EVENT TYPES .......................................................................................................................... 135 7.4. SUBMISSION FAILURES ANALYSIS .................................................................................................... 137 7.5. JOB RESUBMISSION AND RETRYCOUNT............................................................................................ 139 7.6. WILDCARD PATTERNS ...................................................................................................................... 139 7.7. THE MATCH MAKING ALGORITHM .................................................................................................... 141

7.7.1. Direct Job Submission....................................................................................................... 141 7.7.2. Job submission without data-access requirements........................................................... 141 7.7.3. Job submission with data-access requirements................................................................ 144

IST-2000-25182 PUBLIC 9 / 146

Page 10: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

1. INTRODUCTION This document provides a guide to the installation, configuration and usage of the WP1 WMS software released within the DataGrid project.

1.1. OBJECTIVES OF THIS DOCUMENT Goal of this document is to describe the complete process by which the WP1 WMS software can be installed and configured on the DataGrid test-bed platforms. Guidelines for operating the whole system and accessing provided functionalities are also provided.

1.2. APPLICATION AREA Administrators can use this document as a basis for installing, configuring and operating WP1 WMS software. Users can refer to the User Guide chapter for accessing provided services through the User Interface.

1.3. APPLICABLE DOCUMENTS AND REFERENCE DOCUMENTS Applicable documents [A1] JDL Attributes - DataGrid-01-TEN-0142-0_0 – 13/06/2003

(http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0142-0_0.{doc,pdf}) [A2] Definition of the architecture, technical plan and evaluation criteria for the resource

co-allocation framework and mechanisms for parallel job partitioning (http://www.infn.it/workload-grid/docs/DataGrid-01-D1.4-0127-1_0.{doc, pdf})

[A3] DataGrid Accounting System - Architecture v 1.0 (http://www.infn.it/workload-grid/docs/DataGrid-01-TED-0126-1_0.pdf)

[A4] Logging and Bookkeeping Architecture – DataGrid-01-TED-0141 (http://lindir.ics.muni.cz/dg_public/lb_draft2_formatted.pdf)

[A5] Job Description Language HowTo – DataGrid-01-TEN-0102-02 – 17/12/2001 (http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0102-0_2.pdf)

[A6] The Glue CE Schema (http://www.cnaf.infn.it/~sergio/datatag/glue/v11/CE/index.htm)

Reference documents [R1] The Resource Broker Info file – DataGrid-01-TEN-0135-0_0

(http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0135-0_0.{doc,pdf})

[R2] LB-API Reference Document – DataGrid-01-TED-0139-0_0 (http://lindir.ics.muni.cz/dg_public/lb_api.pdf)

IST-2000-25182 PUBLIC 10 / 146

Page 11: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

[R3] Job Partitioning and Checkpointing – DataGrid-01-TED-0119-0_3

(https://edms.cern.ch/file/347730/1/DataGrid-01-TED-0119-0_3.pdf)

IST-2000-25182 PUBLIC 11 / 146

Page 12: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

1.4. DOCUMENT EVOLUTION PROCEDURE The content of this document will be subjected to modification according to the following events: • Comments received from Datagrid project members, • Changes/evolutions/additions to the WMS components.

1.5. TERMINOLOGY Definitions Condor Condor is a High Throughput Computing (HTC) environment that can

manage very large collections of distributively owned workstations Globus The Globus Toolkit is a set of software tools and libraries aimed at the

building of computational grids and grid-based applications. Glossary class-ad Classified advertisement CE CLI

Computing Element Command Line Interface

DB DGAS EDG

Data Base Datagrid Grid Accounting Service European DataGrid

FQDN Fully Qualified Domain Name GIS Grid Information Service, aka MDS GSI GUI HLR IS

Grid Security Infrastructure Graphical User Interface Home Location Register Information Service

job-ad JA JC

Class-ad describing a job Job Adapter Job Controller

JDL Job Description Language LB LM

Logging and Bookkeeping Service Log Monitor

LRMS Local Resource Management System MDS Metacomputing Directory Service, aka GIS

IST-2000-25182 PUBLIC 12 / 146

Page 13: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

MPI NS OS PA

Message Passing Interface Network Server Operating System Price Authority

PID Process Identifier

PM Project Month RB Resource Broker SE Storage Element SI00 Spec Int 2000 SMP Symmetric Multi Processor TBC To Be Confirmed TBD To Be Defined UI VO VOMS WM WMS

User Interface Virtual Organisation Virtual Organisation Membership server Workload Manager Workload Management System

WP Work Package

IST-2000-25182 PUBLIC 13 / 146

Page 14: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

2. EXECUTIVE SUMMARY This document comprises the following main sections: Section 3: Workload management System Overview

Briefly introduces the new revised Workload Management System architecture, and discusses about the deployment of the WMS components.

Section 4: Installation and Configuration Describes changes that need to be made to the environment and the steps to be performed for installing the WMS software on the test-bed target platforms. The resulting installation tree structure is detailed for each system component.

Section 5: Operating the System Provides actual procedures for starting/stopping WMS components processes and utilities.

Section 6: User Guide Describes in a Unix man pages style all User Interface component commands allowing the user to access WMS provided services.

Section 7: Annexes Deepens arguments introduced in the User Guide section that are considered useful for the user to better understand system behaviour.

IST-2000-25182 PUBLIC 14 / 146

Page 15: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

3. WORKLOAD MANAGEMENT SYSTEM OVERVIEW The revised (release 2) architecture of the EDG Workload Management System (WMS), which is described in detail in [A2], is represented in Figure 1.

Figure 1: UML diagram describing the new (rel. 2) WMS architecture

IST-2000-25182 PUBLIC 15 / 146

Page 16: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

The User Interface (UI) is the component that allows users to access the functionalities offered by the Workload Management System. The Network Server (NS) is a generic network daemon, responsible for accepting incoming requests from the UI (e.g. job submission, job removal), which, if valid, are then passed to the Workload Manager. The Workload Manager (WM) is the core component of the Workload Management System. Given a valid request, it has to take the appropriate actions to satisfy it. To do so, it may need support from other components, which are specific to the different request types. All these components that offer support to the Workload Manager provide a class whose interface is inherited from a Helper class. Essentially the Helper, given a JDL expression, returns a modified one, which represents the output of the required action. For example, if the request was to find a suitable resource for a job, the input JDL expression will be the one specified by the user, and the output will be the JDL expression augmented with the CE choice. The Resource Broker (RB) or Matchmaker is one of these classes offering support to the Workload Manager. It provides a matchmaking service: given a JDL expression (e.g. for a job submission), it finds the resources that best match the request. It interacts with the Information Service and with the data management services. The Job Adapter (JA) is responsible for making the final “touches” to the JDL expression for a job, before it is passed to CondorG for the actual submission. So, besides preparing the CondorG submission file, this module is also responsible for creating the wrapper script, and for creating the appropriate execution environment in the CE worker node (this includes the transfer of the input and of the output sandboxes). CondorG is the module responsible for performing the actual job management operations (job submission, job removal, etc.), issued on request of the Workload Manager. The Log Monitor (LM) is responsible for “watching” the CondorG log file, intercepting interesting events concerning active jobs, that is events affecting the job state machine (e.g. job done, job cancelled, etc.), and therefore triggering appropriate actions. For what concerns the Logging and Bookkeeping (LB) service, it stores logging and bookkeeping information concerning events generated by the various components of the WMS. Using this information, the LB service keeps a state machine view of each job. As described in section 4.3, a proxy renewal mechanism is available to assure that, for all the lifetime of a job, a valid user proxy exists within the WMS, and this proxy renewal service relies on the MyProxy software. The DataGrid Accounting System (DGAS) is another functionality offered by the WMS, described in detail in [A3]. DGAS has two main purposes:

• Economic accounting for Grid Users and Resources The users pay for resource usage while the resources earns virtual credits executing user jobs

• Economic Brokering

IST-2000-25182 PUBLIC 16 / 146

Page 17: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Help the Resource Broker in choosing the most suitable resource for a given job according to the current price of a resource and a pre-defined economic policy.

The HLR (Home Location Register) server is responsible of implementing the first item in the list. The second is covered by the Price Authority (PA) service. The suggested configuration is to have a HLR server and a PA server per VO.

3.1. DEPLOYMENT OF THE WMS SOFTWARE For what concerns the deployment of the WMS software, it is possible to identify the following types of “boxes”:

• The User Interface machine, which is used to interact with the functionalities of the WMS: the WMS User Interface software has to be installed on this machine; moreover on this machine part of the DGAS HLR client software (the DGAS job-auth client software) and the LB C and C++ API have to be installed

• The “RB node”, where the Network Server, the Workload Manager and its helpers (Matchmaker and Job Adapter), the Job Controller, CondorG, the Log Monitor, the LB local logger, the LB C API, the Proxy Renewal components have to be installed

• The LB server, where the LB software has to be installed • The Computing Elements (CEs): on the gatekeeper node of each CE the LB local

logger software and part of the DGAS HLR client software (the DGAS ATM client software) have to be installed. On the WNs it is necessary to install the checkpointing API and the C and sh LB APIs.

• The MyProxy server host • The HLR server, where the HLR server and the PA client software have to be

installed • The PA server, where the PA server software has to be installed

These are the EDG WP1 RPMs needed in the various “machines” User Interface machine:

edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-common-api-java-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-dgas-hlr-ui-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-ui-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-ui-api-java-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-ui-cli-X.Y.Z-K_gcc3_2_2.i486.rpm

IST-2000-25182 PUBLIC 17 / 146

Page 18: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-ui-gui-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-chkpt-api-X.Y.Z-K_gcc3_2_2

edg-wl-config-X.Y.Z-K_gcc3_2_2

edg-wl-bypass-X.Y-Z.i486.rpm

“RB node”:

edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-interactive-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-locallogger-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-proxyrenewal-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-wm-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-dgas-hlr-jobAuthClient-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-globus-gridftp-X.Y.Z-gxx3_2_2.i486.rpm

edg-wl-bypass-X.Y-Z.i486.rpm

LB server:

edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-lbserver-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-lbserver-rgma-X.Y.Z-K_gcc3_2_2.i486.rpm Gatekepeer of CE:

edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-dgas-hlr-ATMClient-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-locallogger-X.Y.Z-K_gcc3_2_2.i486.rpm WN of CE:

edg-wl-chkpt-api-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-logging-api-sh-X.Y.Z-K_gcc3_2_2.i486.rpm

edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm

DGAS HLR server:

edg-wl-dgas-hlr-server-X.Y.Z-K_gcc3_2_2.i486.rpm

IST-2000-25182 PUBLIC 18 / 146

edg-wl-dgas-hlr-server-admin-X.Y.Z-K_gcc3_2_2.i486.rpm

Page 19: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

DGAS PA server:

edg-wl-dgas-pa-server-X.Y.Z-K_gcc3_2_2.i486.rpm

Note that in this list only RPMs concerning EDG WP1 software have been specified (i.e. no sw needed by these RPMs has been specified: details can be found in section 4). It is not strictly needed that these different types of services have to be installed on different machines. A machine can in fact host different services (for example, the PA server and the HLR server could run o the same machine).

IST-2000-25182 PUBLIC 19 / 146

Page 20: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

4. INSTALLATION AND CONFIGURATION This section deals with the procedures for installing and configuring the WP1 WMS components on the target platforms. For each of them, before starting with the installation procedure which is described through step-by-step examples, is reported the list of dependencies i.e. the software required on the same machine by the component to run. Moreover a description of needed configuration items and environment variables settings is also provided. It is important to remark that since the RPMs are generated using gcc 3.2 and RPM 4.0.2 it is expected to find the same configuration on the target platforms.

4.1. LOGGING AND BOOKKEEPING SERVICES From the installation point of view LB services can be split in three main components:

• The LB services responsible for accepting messages from their sources and forwarding them to the logging and/or bookkeeping servers, which we will refer as LB local-logger services.

• The LB services responsible for accepting messages from the LB local-logger services, saving them on their permanent storage and supporting queries generated by the consumer API, that we will refer as LB server services.

• The LB APIs (C, C++, sh) The LB local-logger services must be installed on all the machines hosting processes pushing information into the LB system, i.e. the “RB node” and the gatekeeper machines of the CEs. An exception is the submitting machine (i.e. the machine running the User Interface) on which this component can be installed but is not mandatory. The LB server services need instead to be installed only on a server machine. The LB APIs should be installed on the UI machine (C and C++ APIs), “RB node” (C and C++ APIs) and on the CE worker nodes (C and sh APIs).

4.1.1. Required software

4.1.1.1. LB local-logger and LB APIs For the installation of the LB local-logger and LB APIs the only software required is the Globus Toolkit 2.2 (actually only GSI rpms are needed). Globus 2.2 RPMs are available at http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/globus/RPMS/

4.1.1.2. LB Server For the installation of the LB local-logger the only software required is the Globus Toolkit 2.2 (actually only GSI RPMs are needed). Globus 2.2 RPMs are available at http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/globus/RPMS/

IST-2000-25182 PUBLIC 20 / 146

Page 21: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Besides the Globus Toolkit, for the LB server to work properly it is also necessary to install MySQL Distribution 4.0.1 or higher. Packages and documentation about MySQL can be found at: http://www.mysql.org. Anyway the MySQL RPMs for pc-linux-gnu (i686) is available at http://datagrid.in2p3.fr/distribution/external/RPMS/. At least packages MySQL-4.0.x and MySQL-client-4.0.x have to be installed for creating and configuring the LB database. LB server stores the logging data in a MySQL database that must hence be created. The following assumes the database and the server daemon (edg-wl-bkserverd) runs on the same machine, which is considered to be secure, i.e. no database authentication is used. In a different set-up the procedure has to be adjusted accordingly as well as a secure database connection (via ssh tunnel etc.) established. The action list below contains placeholders DB_NAME and USER_NAME: real values have to be substituted. They form the database connection string required on some LB daemons invocation. Suggested value for DB_NAME is ‘lbserver20’ and for USER_NAME is `lbserver'. These values are also the compiled-in defaults (i.e. when used, the database connection string needn't be specified at all). The following needed steps require MySQL root privileges:

1. Create the database: mysqladmin -u root -p create DB_NAME

where DB_NAME is the name of the database.

2. Create a dedicated LB database user: mysql -u root -p -e 'grant create,drop, alter,index, \ select,insert, update,delete on DB_NAME.* to \

USER_NAME@localhost' where USER_NAME is the name of the user running the LB server daemon.

3. Create the database tables:

mysql -u USER_NAME DB_NAME < server.sql

where server.sql is a file containing sql commands for creating needed tables. server.sql can be found in the directory “<install path>/etc” created by the LB server rpm installation.

For the LB server it is also necessary to install expat (recommended release is 1.95.2 or higher), which can be downloaded from: http://datagrid.in2p3.fr/distribution/external/RPMS/.

4.1.2. Configuration

IST-2000-25182 PUBLIC 21 / 146

Page 22: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

4.1.2.1. LB Local-Logger The LB local logger has no configuration file.

4.1.2.2. LB Server The LB server has no configuration file. By default the LB server is configured with only very few indices so that a limited set of queries is supported. Upon installation the server administrator may decide to create additional indices to support further expected user query types. See section 5.2.2 for details.

4.1.3. Environment Variables All LB components recognize the following environment variables in the same way GSI handles them:

• X509_USER_KEY • X509_USER_CERT • X509_CERT_DIR • X509_USER_PROXY

However, in case of LB daemons, the recommended way for specifying security files locations is using --cert, --key, --CAdir options explicitly: GSI searches through various default locations and finding a wrong credential file in some of them may cause unexpected behaviour. The Logging library i.e. the library that is linked into UI, NS, WM, JC, LM, and called from the job-wrapper script recognizes the following environment variables (besides the X509_* ones listed above):

• GLOBUS_HOSTNAME Hostname that will appear as the source of logged events

• EDG_WL_LOG_DESTINATION <hostname>:<port> of the local-logger to use

• EDG_WL_LOG_TIMEOUT Timeout for standard (asynchronous) logging

• EDG_WL_LOG_SYNC_TIMEOUT Timeout for synchronous logging

IST-2000-25182 PUBLIC 22 / 146

Page 23: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

• EDG_WL_QUERY_SERVER

Default server to query (prefix in JobId overrides this setting)

• EDG_WL_QUERY_TIMEOUT Timeout for queries

All them has reasonable defaults and needn’t be set in normal operation (details can be found in [R2]. On the submitting machine if the variable EDG_WL_LOG_DESTINATION is not set, it is dynamically assigned by the UI referring to the machine where the NS runs. The Logging library functions timeout is automatically increased with respect to the default value (recommended for non-locals logging).

IST-2000-25182 PUBLIC 23 / 146

Ales Krenek
Add a formal reference here.
Page 24: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

4.2. SERVICES RUNNING IN THE “RB NODE”: NS, WM, JC, LM As introduced in section 3, the Network Server (NS), the Workload Manager (WM), the Job Controller (JC) and the Log Monitor (LM) are dealt with together since they always reside on the same host (the “RB node”) and consequently are distributed by means of a single rpm.

4.2.1. Required software For the installation of NS, WM, JC and LM, the following products are expected to be installed:

• Globus • Condor-G • ClassAd library • Boost • LB local logger (whose installation and configuration is discussed in section 4.1) • ReplicaManager from the EDG WP2 distribution

4.2.1.1. Globus installation and configuration For what concerns Globus, the required release is 2.2. The Globus software can be downloaded from: http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/globus/RPMS/. Please note that the “RB node” should run a gridftp server (actually a “customized” one: see section 4.2.4.1), while it should not run a globus gatekepeer.

4.2.1.1.1. Condor-G installation and configuration Condor-G release required is CondorG 6.5.1, which can be found at the following URL: http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/condor/RPMS/. Moreover some additional configuration steps have to be performed in the Condor configuration file pointed to by the CONDOR_CONFIG environment variable set during installation. In the $CONDOR_CONFIG file the following attributes need to be modified: RELEASE_DIR = /opt/condor LOCAL_DIR = $ENV(GLOBUS_LOCATION)/var/condor CONDOR_ADMIN = <a valid e-mail address of the Condor-G administrator> and the following entries need to be added:

IST-2000-25182 PUBLIC 24 / 146

Page 25: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

AUTHENTICATION_METHODS = CLAIMTOBE ENABLE_GRID_MONITOR = TRUE GRID_MONITOR = $(SBIN)/grid_monitor.sh

4.2.1.2. ClassAd installation and configuration The ClassAd software required is a “customized” classads-0.9.4 release, available in rpm format (to be installed as root) at: http://datagrid.in2p3.fr/distribution/external/RPMS. The ClassAd library documentation can be found at the following URL: http://www.cs.wisc.edu/condor/classad.

4.2.1.3. Boost installation and configuration The Boost C++ libraries release required is 1.29 (or higher). The boost documentation can be found at the following URL: http://www.boost.org whilst it is available in rpm format (to be installed as root) at: http://datagrid.in2p3.fr/distribution/external/RPMS

4.2.1.4. Replica Manager installation and configuration The Replica Manager RPMs that must be installed are: edg-gsoap-base-1.0.3-1.i386.rpm

edg-replica-location-client-c++-1.2.8-1.i386.rpm

edg-replica-optimization-client-c++-1.2.9-1.i386.rpm

edg-replica-metadata-catalog-client-c++-1.2.8-1.i386.rpm

edg-replica-manager-client-c++-1.0.6-1.i386.rpm

After the RPM installation, it is then needed to configure the configuration files for the various VOs in <istall-dir>/etc/edg-replica-manager (please refer to WP2 documentation for details).

4.2.2. Configuration Once the rpm installation has been performed, the NS, WM, JC and LM services must be properly configured. This can be done editing the file ${EDG_WL_CONFIG_DIR}/edg_wl.conf file. If $EDG_WL_CONFIG_DIR hasn’t been defined, the edg_wl.conf file is looked for first in /opt/edg/etc, then in /etc, and then in /usr/local/etc. This configuration file has the following structure (ClassAd based):

IST-2000-25182 PUBLIC 25 / 146

Page 26: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

[ Common = [

… …

]; NetworkServer = [ … … ]; WorkloadManager = [ … … ]; JobController = [ … … ]; LogMonitor = [ … … ];

] Therefore the configuration file is composed of 5 parts:

• one for the “common” (i.e. “used” by all services) attributes • one for the configuration of the NS • one for the configuration of the WM • one for the configuration of the JC • one for the configuration of the LM

4.2.2.1. Configuration of the “common” attributes As introduced in the previous section, it is necessary first of all to edit the:

Common = [ …

IST-2000-25182 PUBLIC 26 / 146

Page 27: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

]; part of the configuration file, in order to set the attributes “used” by all the services. The only “common” attribute that must be specified is:

• DGUser refers to the user name account running the NS, WM, JC and LM services. E.g.: DGUser = “${EDG_WL_USER}”;

4.2.2.2. NS configuration Configuration of the Network Server is accomplished editing the configuration file and setting opportunely the attributes in the:

NetworkServer = [ … …

]; section. They are listed hereafter grouped according to the functionality they are related with:

• II_Contact, II_Port, II_DN and II_Timeout refer to the II service and respectively represent the hostname where this service is running, the port number, the base DN (which represents the distinguished name to use as a starting place for searches in the information service) to be used when querying the II, and the timeout in seconds to consider when the II is queried. E.g.:

II_Contact = "grid001f.cnaf.infn.it"; II_Port = 2170;

II_DN = "mds-vo-name=local, o=grid";

II_Timeout = 60;

• Gris_Port, Gris_DN and GRIS_Timeout respectively represent the port number where the GRISes run, the base DN to be used when querying the GRISes, and the timeout in seconds when the GRISes are queried. Actually the port and the base DN to be used are specified in the information service schema, and the NS relies on these values: the GRIS_Port and GRIS_DN attributes

IST-2000-25182 PUBLIC 27 / 146

Page 28: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

specified in the configuration file are considered only if, for some reasons, they are not published in the information service. E.g.:

Gris_Port = 2135;

Gris_DN = "mds-vo-name=local, o=grid";

Gris_Timeout = 20;

• ListeningPort is the port used by the NS to listen for requests coming from the User

Interface. Default value for this parameter is:

ListeningPort = 7772;

• MasterThreads defines the maximum number of simultaneous connections with User Interfaces. Default value is:

MasterThreads = 8;

• DispatcherThreads defines the maximum number of simultaneous connections (to

forward the incoming requests) with the Workload Manager. Default value is:

DispatcherThreads = 10;

• SandboxStagingPath represents the pathname of the root sandboxes directory, i.e. the complete pathname referring to the directory where the RB creates both input/output sandboxes directories and stores the “.Brokerinfo” file. Please take care that this directory must not have the sticky bit (o+t). E.g.:

SandboxStagingPath = "/disk/sandbox”;

• EnableQuotaManagement is a Boolean attribute which specifies if the user quota has to be checked to control if there is enough space to store the input sandbox (see section 4.2.4.3) E.g.: EnableQuotaManagement = true;

• MaxInputSandboxSize defines the maximum size (in bytes) for the input sandbox allowed per job. If the size of the input sandbox for a given job is greater than MaxInputSandboxSize, then the job is refused.

IST-2000-25182 PUBLIC 28 / 146

Page 29: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

E.g.

MaxInputSanboxSize = 10000000;

• EnableDynamicQuotaAdjustment and QuotaAdjustmentAmount refer to “dynamic” quota (see section 4.2.4.3). If EnableDynamicQuota is true, than if for the considered user a disk quota hasn’t been set by the system administrator, than a virtual quota equal to QuotaAdjustmentAmount is added for that user for each submitted job, and it is released when the job has completed its execution. E.g.:

EnableDynamicQuotaAdjustment = true;

QuotaAdjustmentAmount = 10000;

• QuotaInsensibleDiskPortion represents the percentage of the disk storing the sandboxes directories that the administrator wants to keep unassigned. So if the free disk space is less than the specified percentage, no new jobs can’t be accepted (see section 4.2.4.3). E.g.:

QuotaInsensibleDiskPortion = 2.0;

• LogFile and LogLevel refer to the NS log file. LogFile is the name of this file, while

LogLevel allows to specify the verbosity of the information the NS records in its log file: 0 is the minimum value (no debug information is written in the log file), while 6 is the maximum value (full debug). E.g.:

LogFile = “${EDG_WL_TEMP}/NetworkServer/log”;

LogLevel = 6;

4.2.2.3. WM configuration Configuration of the Workload Manager is accomplished editing the configuration file and setting opportunely the attributes in the:

WorkloadManager = [ … …

]; section.

IST-2000-25182 PUBLIC 29 / 146

Page 30: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

They are listed hereafter grouped according to the functionality they are related with:

• PipeDepth defines the maximum size of the buffer between the dispatcher and the worker threads. E.g.:

PipeDepth = 10;

• NumberOfWorkerThreads represents the size of the worker threads pool. The default value is:

NumberOfWorkerThreads = 1;

• DispatcherType defines the type of the input queue of requests. There shouldn’t be

any reasons to change the provided default value (“filelist”).

• Input refers to the WM input “queue” of requests. There shouldn’t be reasons to change the provided default value. E.g.:

Input = “"${EDG_WL_TMP}/workload_manager/input.fl";

• MaxRetryCount allows specifying the maximum number of times the WM has to try

to re-schedule and re-submit the job in case the submission to the CE failed (e.g. globus down on the CE, network problems, etc.). When a job is submitted specifying the RetryCount attribute in the JDL, the submission retries attempted by the WM are at most the minimum value between RetryCount and MaxRetryCount. The default value for this configuration parameter is:

MaxRetryCount = 10;

• LogFile and LogLevel refer to the WM log file. LogFile is the name of this file, while LogLevel allows specifying the verbosity of the information the WM records in its log file: 0 is the minimum value (no debug information is written in the log file), while 6 is the maximum value (full debug). E.g.:

LogFile = “${EDG_WL_TEMP}/manager/log/events.log”; LogLevel = 6;

Please note that all directories specified in the WM configuration file are supposed to already exist, i.e. as the WM does not create directories, if they are not already there, they have to be created at installation time.

IST-2000-25182 PUBLIC 30 / 146

Page 31: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

4.2.2.4. JC configuration Configuration of the Job Controller is accomplished editing the configuration file and setting opportunely the attributes in the:

JobController = [ … …

]; section. They are listed hereafter grouped according to the functionality they are related with:

• CondorSubmit CondorRemove, CondorQuery CondorSubmitDag CondorRelease respectively specify the pathname of the condor_submit, condor_rm, condor_q, condor_submit_dag and condor_release Condor-G commands.

• SubmitFileDir defines the directory where the temporary files (the CondorG submit

file and the job wrapper scripts) are created. E.g.:

SubmitFileDir = "${EDG_WL_TMP}/jobcontrol/submit";

• OutputFileDir defines the directory where the standard output and standard error

files of CondorG are temporarily saved. E.g.:

OutputFileDir = "${EDG_WL_TMP}/jobcontrol/condorio";

• Input refers to the JC input “queue” of requests. There shouldn’t be any reasons to

change the default value

• LogFile and LogLevel refer to the JC log file. LogFile is the name of this file, while LogLevel allows specifying the verbosity of the information the JC records in its log file: 0 is the minimum value (no debug information is written in the log file), while 6 is the maximum value (full debug). E.g.:

LogFile = “${EDG_WL_TEMP}/jobcontrol/log/events.log”;

LogLevel = 6;

IST-2000-25182 PUBLIC 31 / 146

Page 32: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

• ContainerRefreshThreshold represents the number of jobs after which the JC has

to re-read the IdRepositoryName LM file (see section 4.2.2.5). There shouldn’t be any reasons to change the provided default value:

ContainerRefreshThreshold = 1000;

• UseFakeForProxy and UseFakeForReal are used for debug purposes. Therefore

there shouldn’t be any reasons to modify the default values:

UseFakeForProxy = false;

UseFakeForReal = false;

4.2.2.5. LM configuration Configuration of the Log Monitor is accomplished editing the configuration file and setting opportunely the attributes in the:

LogMonitor = [ … …

]; section. They are listed hereafter grouped according to the functionality they are related with:

• CondorLogDir defines the directory name where the CondorG log files (i.e. the files

where the events for the submitted jobs are recorded) are created. E.g.: CondorLogDir = "${EDG_WL_TMP}/LM/CondorGlog";

• JobsPerCondorLog represents the number of jobs whose events are recorded for each single CondorG log file. So every JobsperCondorLog jobs, the CondorG log file is changed. E.g.: JobsperCondorLog = 1000;

IST-2000-25182 PUBLIC 32 / 146

Page 33: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

• MainLoopDuration defines when the LM reads the CondorG log files: every

MainLoopDuration seconds, the LM reads the CondorG log files. E.g.:

MainLoopDuration = 10;

• CondorLogRecycleDir defines the directory name where the already read (by LM) CondorG log files are stored. E.g.: CondorLogRecycleDir = "${EDG_WL_TMP}/LM/RecCondorGlog";

• MonitorInternalDir is the directory where some files needed for the LM service by

internal purposes are created and stored. E.g.: MonitorInternalDir = "${EDG_WL_TMP}/LM/internal";

• IdRepositoryName is the name of a file used by LM for internal purposes (where the

dgjobid – Condorid correspondences are kept). E.g.: IdRepositoryName = "irepository.dat";

• AbortedJobsTimeout represents the timeout (in seconds) to have a cancelled job

forgot by the LM (useful when the job is hang in the CondorG queue). E.g.: AbortedJobsTimeout = 600;

• LogFile and LogLevel refer to the LM log file. LogFile is the name of this file, while

LogLevel allows specifying the verbosity of the information the LM records in its log file: 0 is the minimum value (no debug information is written in the log file), while 6 is the maximum value (full debug). E.g.:

LogFile = “${EDG_WL_TEMP}/LM/log/events.log”;

LogLevel = 6;

4.2.3. Environment variables Environment variables that have to be set (or can be set) for the NS, WM, JC and LB services are listed hereafter:

• EDG_WL_LOG_DESTINATION IST-2000-25182 PUBLIC 33 / 146

Page 34: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

The Logging library i.e. the library providing APIs for logging job events to the LB reads its immediate logging destination from the environment variable EDG_WL_LOG_DESTINATION (see section 4.1.3).

• CONDOR_CONFIG This variable has to refer to the CondorG configuration file, usually /opt/condor/etc/condor_config (see section 4.2.1.1.1).

• EDG_WL_CONFIG_DIR

As explained in section 4.2.2, this variable refers to the directory where the configuration file for the WMS services running on the “RB node” (edg_wl.conf) is available.

• GRIDMAP This variable must refer to the grid-mapfile (usually /etc/grid-secury/grid-mapfile)

• LD_LIBRARY_PATH Should include $GLOBUS_LOCATION/lib, the Boost lib directory and the gcc 3.2 lib directory

• EDG_LOCATION Should refer to the EDG software installation directory (usually /opt/edg): needed for the WP2 services used by the RB

Then of course, if some environment variables are used in the NS/WM/JC/LM configuration sections, they have of course to be set as well. Anyway, all variables that must be defined for the proper execution of the WMS services, are set by the relevant start-up scripts.

4.2.4. Other requirements and configurations for the “RB node”

4.2.4.1. Customized Gridftp server To assure the “security” of the input and output sandboxes a “customized” Gridftp server has to run on the “RB node”. With this “patched” Gridftp server, the sandbox files are transferred in the “RB node” belonging to the group of the user running the NS, WM, JC and LM services (usually edguser) and rwxrwx--- as mask. In this way a user cannot access sandbox files belonging to other users.

IST-2000-25182 PUBLIC 34 / 146

Page 35: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

In order to install this “customized” Gridftp server, the following RPM has to be installed:

edg-wl-globus-gridftp-X.Y.Z-K.i486.rpm

By default this rpm installs the software in the “/opt/edg” directory After having installed the software, it may be necessary to modify the line:

magicgroup edguser all

in the file: <INSTALL-PREFIX>/etc/ftpaccess

if edguser is not the group for the user running the WMS services in the “RB node”. To start/stop this patched gridftp server, the following command has to be issued: /etc/rc.d/init.d/edg-wl-ftpd start/stop

4.2.4.2. Grid-mapfile The Globus grid-mapfile (usually located in /etc/grid-security) on the “RB node” must be filled with the certificate subjects of all the users allowed to use the WMS functionalities. Users being mapped into the gridmap-file have to belong to groups which, for security reasons, have to be different than the group for the dedicated user (e.g. edguser) running the NS, WM, JC, LM services.

4.2.4.3. Disk Quota When a job is submitted to the WMS, first of all the NS checks if there is enough space to store its input sandbox files. Moreover, as explained in section 4.2.2.2, the NS checks that the input sandbox size is not greater than the value specified as MaxInputSandboxSize in the NS configuration section, otherwise the job is refused. As introduced in section 4.2.2.2, it is also possible enabling a disk quota check (by setting EnableQuotaManagement=true in the NS configuration section). In this case, when a user submits a job, the NS checks the disk quotas for that particular local account (the one defined in the grid-mapfile), to see if it is possible to move the input sandbox files in the “RB node”. So, if the disk quota check has been enabled in the NS configuration file, the disk

IST-2000-25182 PUBLIC 35 / 146

Page 36: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

quota option had to be enabled, and disk quotas had to be set for the various users allowed to submit jobs to the WMS (i.e. the ones defined in the grid-mapfile). If the NS configuration parameter EnableDynamicQuota is set to true, than if for the considered user a disk quota hasn’t been set by the system administrator, than a “dynamic” quota equal to QuotaAdjustmentAmount (an other NS configuration parameter) is added for that user for each submitted job, and it is released when the job has completed its execution. It is also possible to define (via the QuotaInsensibleDiskPortion NS configuration parameter) a portion of disk. If the free space of the disk used for storing input and output sandboxes is less than this percentage value, than no new jobs can be submitted.

4.3. SECURITY SERVICES The EDG WMS software relies on the GSI mechanisms for what concerns authentication. This means that, for all the lifetime of a job, a valid user proxy must exist within the WMS (not necessarily in the UI) for all the lifetime of a job. A secure way for achieving this (instead of considering long time proxy) is to exploit the proxy renewal (PR) mechanisms, which rely on the MyProxy package. The underlying idea is that the user registers in a MyProxy server a valid long-term certificate proxy that will be used by the WMS to perform a periodic credential renewal for the submitted job; in this way the user is no longer obliged to create very long lifetime proxies when submitting jobs lasting for a great amount of time. The MyProxy credential repository system consists of a server and a set of client tools that can be used to delegate and retrieve credentials to and from a server. Normally, a user would start by using the myproxy_init client program along with the permanent credentials necessary to contact the server and delegate a set of proxy credentials to the server along with authentication information and retrieval restrictions. The Proxy Renewal (PR) service maintains users' proxy certificates and, by periodically contacting the Myproxy server, keeps the certificates valid. The service communicates only through a local unix socket, so it must be installed on the same machine where services registering proxies run (i.e. on the “RB node”). Therefore:

• In the “RB node” it is necessary to deploy (via the edg-wl-proxyrenewal-X.Y.Z-K.i486.rpm RPM) the PR software and the Myproxy libraries (as discussed in the section 4.3.2)

• In the “Myproxy server” node it is necessary to deploy the Myproxy Server software, discussed in section 4.3.1

The MyProxy Toolkit is available at the following URL: http://myproxy.ncsa.uiuc.edu/ MyProxy version v0.5.3 is recommended for the Datagrid environment.

4.3.1. MyProxy Server myproxy-server is a daemon that runs on a trusted, secure host and manages a database of proxy credentials for use from remote sites. Proxies have a lifetime that is controlled by the

IST-2000-25182 PUBLIC 36 / 146

Page 37: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

myproxy-init program. When a proxy is requested from the myproxy-server, via the myproxy-get-delegation command, further delegation insures that the lifetime of the new proxy is less than the original to enforce greater security. A configuration file is responsible for maintaining a list of trusted portals and users that can access this service. To configure a Myproxy server, one must restrict the users that are allowed to store credentials within the Myproxy server and, more importantly, which clients are allowed to retrieve credentials from the Myproxy server. To do that, it is necessary to edit a configuration file (/etc/myproxy-server.conf) and add specific services allowed to carry out proxy renewal. An example of myproxy-server.conf is below: accepted_credentials "/C=CZ/O=CESNET/*" accepted_credentials "/C=IT/O=INFN/*"

authorized_renewers \

"/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0380.cern.ch"

authorized_renewers \

"/C=IT/O=INFN/OU=gatekeeper/L=CNAF/CN=grid010g.cnaf.infn.it/[email protected]"

As it is possible to see, it contains the subject names of all the resources who are allowed to renew credentials (the recognized “RB nodes”) and the prefixes of the subject names of the users that are allowed to store proxies in the repository. In order to launch the myproxy-server daemon, it is necessary to run the binary <prefix>/sbin/myproxy-server. The program will start up and background itself. It accepts connections on TCP port 7512, forking off a separate child to handle each incoming connection. It logs information via the syslog service. A Starting script (/etc/init.d/myproxy) is provided to run the service on reboot.

4.3.2. Proxy renewal service

4.3.2.1. Required software Globus 2.2 (which can be downloaded from http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/globus/RPMS/) and the Myproxy libraries (version v0.5.3 recommended) are needed for the Proxy Renewal services.

4.3.2.2. Configuration The PR daemon has no configuration file.

IST-2000-25182 PUBLIC 37 / 146

Page 38: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

4.3.2.3. Environment variables The PR daemon recognizes the following environment variables in the same way the GSI handles them:

• X509_USER_KEY • X509_USER_CERT • X509_CERT_DIR • X509_USER_PROXY

4.4. GRID ACCOUNTING SERVICES

4.4.1. Required software As introduced in section 3, for what concerns the DGAS services, it is necessary to install:

• The HLR server software plus the PA client software on a HLR server machine • The PA server software on a PA server machine • The DGAS job-auth client software on the UI machine • The ATM client software on the gatekeeper node of the CE

For the DGAS software, the Globus Toolkit 2.2 software is required (actually only GSI RPMs are needed). Globus 2 RPMs are available at http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/globus/RPMS/. Besides Globus Toolkit, for both PA and HLR servers it is also necessary to install MySQL Distribution 4.0.1 or higher. Packages and documentation about MySQL can be found at: http://www.mysql.org Anyway the MySQL RPMs for pc-linux-gnu (i686) is available at http://datagrid.in2p3.fr/distribution//external/RPMs. The MySQL database can be executed basically in two ways, as a daemon waiting both for remote TCP calls and local Unix-socket calls, or waiting for local calls only. DGAS doesn't need MySQL to wait for incoming TCP calls, so if this feature is not needed for other purposes, it is strongly suggested to instruct MYSQL listening Unix-socket only. In order to skip networking in MySQL, the following line: skip-networking

IST-2000-25182 PUBLIC 38 / 146

Page 39: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

has to be added in the MySQL configuration file (usually /etc/my.conf) under the daemon part of the configuration file (i.e. [mysqld]).

4.4.1.1. Creating the MySQL databases for the HLR server The HLR stores its data in two databases that needs to be installed and configured. The file needed to install the databases can be found under <install-prefix>/etc and are:

• hlr.sql Main DB used by the HLR engine.

• hlr_tmp.sql

DB where the HLR stores temporary data. In order to install the HLR DBs, it is first necessary to create them: # mysqladmin create hlr

# mysqladmin create hlr_tmp and then the previous listed files have to be used to create the tables in the databases:

# mysql hlr < hlr.sql

# mysql hlr_tmp < hlr_tmp.sql

4.4.1.2. Creating the MySQL database for the PA server The PA stores its data in one database that needs to be installed and configured. The file needed to install the database can be found under <install-path>/etc, and is:

• pa.sql Main DB used by the PA engine.

In order to install the PA DB, first of all it has to be created: # mysqladmin create pa and then the previous mentioned file can be used to create the tables in the database:

IST-2000-25182 PUBLIC 39 / 146

Page 40: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

# mysql pa < pa.sql

4.4.2. Configuration

4.4.2.1. Configuring the HLR server The main options for the HLR server daemon can be set up in its configuration file, which can be referenced when starting the daemon (see section 5.6.1.1). The file is usually found in $EDG_WL_LOCATION/etc The configuration file accepts parameters in the form: item = "value" with an item-value pair per line. These are the parameters that can be specified in the HLR configuration file:

• hlr_sql_server The host where the HLR databases are installed (usually it is the localhost)

• hlr_sql_user The user running the HLR database

• hlr_sql_password

The HRL database user password • hlr_sql_dbname

The HLR database name • hlr_tmp_sql_dbname

The HRL tmp database name

• hlr_port The HLR server listening port

• hlr_log

The HLR log file name

IST-2000-25182 PUBLIC 40 / 146

Page 41: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

4.4.2.2. Configuring the PA server The main options for the PA server daemon can be set up in its configuration file, which can be referenced when starting the daemon (see section 5.6.1.2). The file is usually found in <install-path>/etc. These are the parameters that can be specified in the PA configuration file:

• pa_sql_server The host where the PA database is installed (usually it is the localhost)

• pa_sql_user The user running the PA database

• pa_sql_password

The PA database user password • pa_sql_dbname

The PA database name

• pa_port The PA server listening port

• pa_log

The PA log file name

4.4.2.3. Configuring the ATM client software As mentioned before, in the gatekeeper node of each CE, the DGAS ATM client software has to be installed and configured. It is necessary to specify, via a configuration file, the full contact string for the Resource PA and HLR. This configuration file, referenced by the DGAS ATM Client API, should usually be <install-path>/etc/ edg-wl-ATMClient-dgas.conf and it has to specify the two following attributes:

• res_acct_PA_id The resource PA, in the format: Res_acct_OA_id= "hostname:portnumber:X509CertSubject"

• res_acct_bank_id

The resource HLR, in the format: IST-2000-25182 PUBLIC 41 / 146

Page 42: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

res_acct_bank_it= "hostname:portnumber:X509CertSubject"

4.4.3. Environment variables The grid accounting services don’t rely on any environment variables.

IST-2000-25182 PUBLIC 42 / 146

Page 43: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

4.5. USER INTERFACE This section describes the steps needed to install and configure the User Interface, which is the software module of the WMS allowing the user to access main services made available by the components of the scheduling sub-layer. The UI software is distributed in 4 different packages:

− The python command line interface − The C++ API − The Java API − The Java GUI

4.5.1. Required software All the above listed packages have a dependency on the Globus Toolkit software. The required release is 2.2 from the VDT distribution. It can be downloaded from http://datagrid.in2p3.fr/distribution/globus/vdt-1.1.8/globus/RPMS/. The needed rpms are listed here below:

− vdt_globus_essentials-EDGVDT1.1.8-5.i386.rpm

− vdt_globus_sdk-EDGVDT1.1.8-5.i386.rpm

− vdt_compile_globus_core-EDGVDT1.1.8-1.i386.rpm

− globus-initialization-2.2.4-2.noarch.rpm

Moreover the set of security configuration rpm’s for all the Certificate Authorities in Testbed2 available at http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/ have to be installed together with the rpm to be used for renewing your certificate for your CA. This is available at http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/local/. Lastly the MyProxy package should be installed on the UI node in order to allow users to take advantage of the proxy-renewal feature for long running jobs. The corresponding rpm can be fount at http://datagrid.in2p3.fr/distribution/external/RPMS and is named as follows:

− myproxy-gcc32dbg-client-0.5.3-1.i386.rpm

4.5.1.1. Python Command Line Interface In order to install the CLI, apart form the proper user interface rpm:

− edg-wl-ui-cli-X.Y.Z-K_gcc3_2_2.i486.rpm

the common UI configuration rpm:

IST-2000-25182 PUBLIC 43 / 146

Page 44: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

− edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm

− edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm

and the configuration LCFGng objects:

− edg-lcfg-cliconfig

− edg-lcfg-uicmnconfig

the following WMS and third-party packages are needed:

− edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm (WMS common lib)

− edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm (LB consumer C++ API) − edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm (LB client C API)

− edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm (Condor bypass) Moreover the VOMS client C++ API rpm:

− voms-api_gcc3_2_2-1.1.39-1_RH7.3

available at http://datagrid.in2p3.fr/distribution/autobuild/i386-rh7.3-gcc3.2.2/wp6/RPMS/ is needed on the UI, together with the VO specific rpm containing the credentials of the VOMS server for the given VO (one rpm per VO is needed):

− edg-voms-vo-<VO name>-X.Y-Z.noarch.rpm

available at http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/ . Lastly, the Python interpreter, version 2.2.2 must also be installed on the submitting machine. The rpm for this package is available at http://datagrid.in2p3.fr/distribution/redhat-7.3/updates/RPMS as:

− python2-2.2.2-11.7.3.i386.rpm

− tkinter2-2.2.2-11.7.3.i386.rpm

Information about python and the package sources can be found at www.python.org.

IST-2000-25182 PUBLIC 44 / 146

Page 45: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

4.5.1.2. C++ API The UI C++ API is distributed within the following rpm:

− edg-wl-ui-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm

Moreover the following WMS and third-party packages are needed:

− edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm (WMS common lib)

− edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm (LB consumer C++ API) − edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm (LB client C API)

− edg-wl-chkpt-api-X.Y.Z-K_gcc3_2_2.i486.rpm (Checkpointing API)

− edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm (Condor bypass) The VOMS client C++ API rpm:

− voms-api_gcc3_2_2-1.1.39-1_RH7.3

available at http://datagrid.in2p3.fr/distribution/autobuild/i386-rh7.3-gcc3.2.2/wp6/RPMS/ is also needed on the UI, together with the VO specific rpm containing the credentials of the VOMS server for the given VO (one rpm per VO is needed):

− edg-voms-vo-<VO name>-X.Y-Z.noarch.rpm

available at http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/ . Lastly, the following external rpms, all available at the following URL: http://datagrid.in2p3.fr/distribution/external/RPMS need to be installed on the UI node. They are the customised Condor classads library (see 4.2.1.2 for details):

− classads-g3-0.9.4-vh8.i486.rpm

and the The Boost C++ libraries (see 4.2.1.3 for details):

− boost-g3-1.29.1-vh6.i486.rpm

4.5.1.3. Java API The UI Java API is distributed within the following rpm:

IST-2000-25182 PUBLIC 45 / 146

Page 46: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

− edg-wl-ui-api-java-X.Y.Z-K.i486.rpm Moreover the following WMS and third-party packages are needed:

− edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm (WMS common lib)

− edg-wl-common-api-java-X.Y.Z-K.i486.rpm (requestAd jar)

− edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm (LB consumer C++ API) − edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm (LB client C API)

− edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm (Condor bypass)

Lastly, the following external rpms all available at http://datagrid.in2p3.fr/distribution/external/RPMS need to be installed on the UI node. They are the customised Condor classads java library version 1.1:

− classads-jar-1.1-2.i386.rpm

the Java 2 Development Kit version 1.4 (or greater):

− j2sdk-1.4.1_01-fcs.i586.rpm − j2sdk_profile-1.4.1_01-1.noarch.rpm

the Globus Java CoG Kit version 1.0 alpha:

− cog-jar-1.0-1_alpha.i386.rpm

the Log4J package version 1.2.6:

− log4j-1.2.6-1jpp.noarch.rpm and the EDG Java Security API:

− bouncycastle-jdk14-1.19-2

− edg-java-security-1.4.1-1

4.5.1.4. Java GUI In order to install the Java Graphical User Interface, apart form the proper GUI rpm:

IST-2000-25182 PUBLIC 46 / 146

Page 47: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

− edg-wl-ui-gui-X.Y.Z-K.i486.rpm

the Java API rpm:

− edg-wl-ui-api-java-X.Y.Z-K.i486.rpm

the common UI configuration rpm:

− edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm

− edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm

and the configuration LCFGng objects:

− edg-lcfg-guiconfig

− edg-lcfg-uicmnconfig

the following WMS and third-party packages are needed:

− edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm (WMS common lib)

− edg-wl-common-api-java-X.Y.Z-K.i486.rpm (requestAd jar)

− edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm (LB consumer C++ API) − edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm (LB client C API)

− edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm (Condor bypass)

Moreover, the following external rpms all available at http://datagrid.in2p3.fr/distribution/external/RPMS need to be installed on the UI node. They are the customised Condor classads java library version 1.1:

− classads-jar-1.1-2.i386.rpm

the Java 2 Development Kit version 1.4 (or greater):

− j2sdk-1.4.1_01-fcs.i586.rpm − j2sdk_profile-1.4.1_01-1.noarch.rpm

the Globus Java CoG Kit version 1.0 alpha:

IST-2000-25182 PUBLIC 47 / 146

Page 48: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

− cog-jar-1.0-1_alpha.i386.rpm

the Log4J package version 1.2.6:

− log4j-1.2.6-1jpp.noarch.rpm and the EDG Java Security API:

− bouncycastle-jdk14-1.19-2

− edg-java-security-1.4.1-1

4.5.2. RPM installation All the needed rpms can be downloaded with the command

wget -nd –r <URL>/<rpm name>

and installed with

rpm –ivh <rpm name>

As stated at the beginning of section 4.5.1 all UI packages requires the installation of the Globus Toolkit software release 2.2 from the VDT distribution and the MyProxy package. It is important to remark that since the RPMs are generated using gcc 3.2 and RPM 4.0.2 it is expected to find the same configuration on the target platforms. Hereafter are reported details for each UI package.

4.5.2.1. Python Command Line Interface In order to install the python command line User Interface, the following commands have to be issued with root privileges:

rpm –ivh edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-ui-cli-X.Y.Z-K_gcc3_2_2.i486.rpm

IST-2000-25182 PUBLIC 48 / 146

Page 49: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

By default the rpms install the software in the “/opt/edg” directory. Moreover the VOMS API rpms have to be installed as follows:

rpm –ivh voms-api_gcc3_2_2-1.1.39-1_RH7.3 rpm –ivh edg-voms-vo-<VO name>-X.Y-Z.noarch.rpm

Of course the python2.2 rpms have to be installed too if they are not present on the machine (should be included in the RH 7.3 distribution).

4.5.2.2. C++ API In order to install the UI C++ API, the following commands have to be issued with root privileges:

rpm –ivh edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm rpm –ivh edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm rpm –ivh edg-wl-chkpt-api-X.Y.Z-K_gcc3_2_2.i486.rpm rpm –ivh edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-ui-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm

By default the rpms install the software in the “/opt/edg” directory. Moreover the VOMS API and the classads and boost libraries have to be installed as follows:

rpm –ivh voms-api_gcc3_2_2-1.1.39-1_RH7.3 rpm –ivh edg-voms-vo-<VO name>-X.Y-Z.noarch.rpm rpm –ivh classads-g3-0.9.4-vh8.i486.rpm

rpm –ivh boost-g3-1.29.1-vh6.i486.rpm

4.5.2.3. Java API In order to install the UI Java API, the following commands have to be issued with root privileges:

rpm –ivh j2sdk-1.4.1_01-fcs.i586.rpm rpm –ivh j2sdk_profile-1.4.1_01-1.noarch.rpm

rpm –ivh classads-jar-1.1-2.i386.rpm

IST-2000-25182 PUBLIC 49 / 146

Page 50: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

rpm –ivh cog-jar-1.0-alpha-1.0-1_alpha.i386.rpm

rpm –ivh log4j-1.2.6-1jpp.noarch.rpm

rpm –ivh bouncycastle-jdk14-1.19-2

rpm –ivh edg-java-security-1.4.1-1

rpm –ivh edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm rpm –ivh edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm rpm –ivh edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-common-api-java-X.Y.Z-K.i486.rpm rpm –ivh edg-wl-ui-api-java-X.Y.Z-K.i486.rpm

By default the WMS rpms install the software in the “/opt/edg” directory.

4.5.2.4. Java GUI In order to install the Java GUI, the following commands have to be issued with root privileges:

rpm –ivh j2sdk-1.4.1_01-fcs.i586.rpm rpm –ivh j2sdk_profile-1.4.1_01-1.noarch.rpm

rpm –ivh classads-jar-1.1-2.i386.rpm rpm –ivh cog-jar-1.0-alpha-1.0-1_alpha.i386.rpm

rpm –ivh log4j-1.2.6-1jpp.noarch.rpm

rpm –ivh bouncycastle-jdk14-1.19-2

rpm –ivh edg-java-security-1.4.1-1

rpm –ivh edg-wl-common-api-X.Y.Z-K_gcc3_2_2.i486.rpm rpm –ivh edg-wl-logging-api-cpp-X.Y.Z-K_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-logging-api-c-X.Y.Z-K_gcc3_2_2.i486.rpm rpm –ivh edg-wl-bypass-2.5.3-4_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-common-api-java-X.Y.Z-K.i486.rpm

rpm –ivh edg-wl-ui-api-java-X.Y.Z-K.i486.rpm

IST-2000-25182 PUBLIC 50 / 146

Page 51: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

rpm –ivh edg-wl-config-X.Y.Z-K_gcc3_2_2.i486.rpm

rpm –ivh edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm rpm –ivh edg-wl-ui-gui-X.Y.Z-K.i486.rpm

By default the WMS rpms install the software in the “/opt/edg” directory.

4.5.3. Configuration The User Interface C++ and Java API packages have no configuration. The Python command line interface and the GUI have instead a common configuration section that allows setting VO-specific parameters. This information is provided within file edg_wl_ui.conf. There is one such file for each of the supported EDG VOs. These files are located in the directory

$EDG_WL_LOCATION/etc/<VO name>/

i.e. there is one directory per VO . The VO name is lower case. These directories are created by the LCFGng object called edg-lcfg-uicmnconfig, so if the installation is not performed using LCFGng, after having installed the common configuration rpm (edg-wl-ui-config-X.Y.Z-K_gcc3_2_2.i486.rpm) that creates the directory: $EDG_WL_LOCATION/etc/vo_template

containing the file edg_wl_ui.conf, you must create in $EDG_WL_LOCATION/etc a directory for each needed VO and copy in it the file $EDG_WL_LOCATION/etc/vo_template/edg_wl_ui.conf

opportunely updated. The edg_wl_ui.conf file is a classad containing the following fields:

− VirtualOrganisation this is a string representing the name of the virtual organisation the file refers to. It should match with the name of the directory containing the file. This parameter is mandatory.

− NSAddresses this is a string or a list of strings representing the address or list of addresses (<host fqdn>:<port>) of the Network Servers available for the given VO. Job submission is performed towards the first NS in the list and in case of failure it is retried on each listed NS until succes or the end of the list is reached. This parameter is mandatory.

− LBAddresses this is a string or a list of strings representing the address or list of addresses (<host fqdn>:<port>) of the LB servers available for the

IST-2000-25182 PUBLIC 51 / 146

Page 52: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

given VO. When job submission is performed, the UI choses randomly one LB server within the list and uses it for generating the job identifier so that all information related with that job will be managed by the chosen LB server. This allows distributing load on several LB servers. This parameter is mandatory.

− HLRLocation this is a string representing the address (<host fqdn>:<port>:<X509contact string>) of the HLR for the given VO. HLR is the service responsible for managing the economic transactions and the accounts of user and resources. This parameter is not mandatory. It is not present in the file by default. If present, it makes the UI automatically add to the job description the HLRLocation JDL attribute (if not specified by the user) and this enables accounting.

− MyProxyServer this is a string representing the MYProxy server address (<host fqdn>) for the given VO. This parameter is not mandatory. It is not present in the file by default. If present, it makes the UI automatically add to the job description the MyProxyServer JDL attribute (if not specified by the user) and this enables proxy renewal. If the myproxy client package is installed on the UI node, then this parameter should be set equal to the MYPROXY_SERVER environment variable.

Herafter is provided an example of configuration file for the “atlas” Virtual Organisation. The file path will hence be $EDG_WL_LOCATION/etc/atlas/edg_wl_ui.conf

[

VirtualOrganisation = "atlas";

NSAddresses = {

"ibm139.cnaf.infn.it:7772",

"grid013f.cnaf.infn.it:9772",

"grid012f.cnaf.infn.it:9772",

"grid004f.cnaf.infn.it:7771"

};

LBAddresses = {

"ibm139.cnaf.infn.it:9000",

"fox.to.infn.it:9000"

};

HLRLocation = "lilith.to.infn.it:56568:/C=IT/O=INFN/OU=Personal Certificate/L=Torino/CN=Andrea Guarise/[email protected]";

MyProxyServer = "skurut.cesnet.cz"; ]

4.5.3.1. Python Command Line Interface Configuration of the Command line User Interface is accomplished through the file $EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf

IST-2000-25182 PUBLIC 52 / 146

Page 53: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

This file is installed by the LCFGng object edg-lcfg-cliconfig, so if the installation is not performed using LCFGng, after having installed the UI rpm (edg-wl-ui-cli-X.Y.Z-K_gcc3_2_2.i486.rpm) that creates the file:

$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf.template

you must copy it in : $EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf

and update the content of the latter file opportunely. The edg_wl_ui_cmd_var.conf file is a classad containing the following fields:

− requirements this is an expression representing the default value for the requirements expression in the JDL job description. This parameter is mandatory. The value of this parameter is assigned by the UI to the requirements attribute in the JDL if not specified by the user. If the user has instead provided an expression for the requirements attribute in the JDL, the one specified in the configuration file is added (in AND) to the existing one. E.g. if in the edg_wl_ui_cmd_var.conf configuration file there is: requirements = other.GlueCEStateStatus == "Production" ;

and in the JDL file the user has specified: requirements = other.GlueCEInfoLRMSType == "PBS";

then the job description that is passed to the NS contains requirements = (other.GlueCEInfoLRMSType == "PBS") && (other.GlueCEStateStatus == "Production");

Obviously the setting TRUE for the requirements in the configuration file does not have any impact on the evaluation of job requirements as it would result in: requirements = (other.GlueCEInfoLRMSType == "PBS") && TRUE ;

− rank this is an expression representing the default value for the rank expression in the JDL job description. The value of this parameter is assigned by the UI to the rank attribute in the JDL if not specified by the user. This parameter is mandatory.

− RetryCount this is an integer representing the default value for the number of submission retries for a job upon failure due to some grid component (i.e. not to the job itself). The value of this parameter is assigned by the UI to the RetryCount attribute in the JDL if not specified by the user.

− DefaultVo this is a string representing the name of the virtual organisation to be taken as the user’s VO (VirtualOrganisation attribute in the JDL) if not specified by the user neither in the credentials VOMS extension, nor directly in the job description nor through the --vo option. This attribute can be either set to “unspecified” or not included at all in the file to mean that no default is set for the VO.

IST-2000-25182 PUBLIC 53 / 146

Page 54: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

− ErrorStorage this is a string representing the path of the directory where the UI

creates log files. This directory is not created by the UI, so It has to be an already existing directory. Default for this parameter is /tmp.

− OutputStorage this is a string defining the path of the directory where the job OutputSandbox files are stored if not specified by the user through commands options. This directory is not created by the UI, so It has to be an already existing directory. Default for this parameter is /tmp.

− ListenerStorage this is a string defining the path of the directory where are created the pipes where the edg_grid_console_shadow process saves the job standard streams for interactive jobs. Default for this parameter is /tmp.

− LoggingDestination this is a string defining the address (<host>:[<port]) of the logging service (edg-wl-logd logging daemon ) to be targeted when logging events. The UI first check the environment for the EDG_WL_LOG_DESTINATION variable and only if this is not set, the value of the LoggingDestination parameter is taken into account.

− LoggingTimeout this is an integer representing the timeout in seconds for asynchronous logging function called by the UI when logging events to the LB. Recommended value for UI that are non-local to the logging service (edg-wl-logd logging daemon) is not less than 30 seconds.

− LoggingSyncTimeout this is an integer representing the timeout in seconds for synchronous logging function called by the UI when logging events to the LB. Recommended value is not less than 30 seconds.

− DefaulStatusLevel this is an integer defining the default level of verbosity for the edg-job-status command. Possible values are 0,1 and 2. 0 is the default and means minimum verbosity. Default for this parameter is 0.

− DefaultLogInfoLevel this is an integer defining the default level of verbosity for the edg-job-get-logging-info command. Possible values are 0,1 and 2. 0 is the default and means minimum verbosity. Default for this parameter is 0.

− NSLoggerLevel this is an integer defining the quantity of information logged by the NS client. Possible values range from 0 to 6. 0 is the defaults and means that no information is logged. Default for this parameter is 0.

Hereafter is provided an example of the $EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf configuration file.

[

requirements = other.GlueCEStateStatus == "Production" ;

rank = - other.GlueCEStateEstimatedResponseTime ;

RetryCount = 3 ;

ErrorStorage= "/var/tmp" ;

OutputStorage="/tmp";

ListenerStorage = "/tmp"

LoggingTimeout = 30 ;

LoggingSyncTimeout = 45 ;

IST-2000-25182 PUBLIC 54 / 146

Page 55: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

LoggingDestination = "ibm139.cnaf.infn.it:9002" ;

DefaultStatusLevel = 1 ;

DefaultLogInfoLevel = 0;

NSLoggerLevel = 2;

DefaultVo = "cms";

]

The files: $EDG_WL_LOCATION/etc/edg_wl_ui_cmd_err.conf

and $EDG_WL_LOCATION/etc/edg_wl_ui_cmd_help.conf

contain respectively the error codes and error messages retyurned by the UI and the text describing the commands usage.

4.5.3.2. Java GUI The Java GUI is composed by three components:

− JobSubmitter − JobMonitor − JDLEditor

Configuration of the Java GUI is accomplished through the file $EDG_WL_LOCATION/etc/edg_wl_ui_gui_var.conf

This file is installed by the LCFGng object edg-lcfg-guiconfig, so if the installation is not performed using LCFGng, after having installed the UI rpm (edg-wl-ui-gui-X.Y.Z-K.i486.rpm) that creates the file:

$EDG_WL_LOCATION/etc/edg_wl_ui_gui_var.conf.template

you must copy it in : $EDG_WL_LOCATION/etc/edg_wl_ui_gui_var.conf

and update the content of the latter file opportunely. The edg_wl_ui_gui_var.conf file is a classad containing the following fields:

− JDLEDefaultSchema this is a string representing the default schema used by the JDLEditor for building the rank and requirements expressions in the JDL job description. This should be the schema of the Information Service describing the resources targeted by the job submissions.

IST-2000-25182 PUBLIC 55 / 146

Page 56: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

The following three attributes -- requirements, rank and rankMPI -- are elements of a sub-classad that has the name of the schema (see example below):

− requirements this is an expression representing the default value for the requirements expression in the JDL job description. This parameter is mandatory. The value of this parameter is assigned by the UI to the requirements attribute in the JDL if not specified by the user. If the user has instead provided an expression for the requirements attribute in the JDL, the one specified in the configuration file is added (in AND) to the existing one. E.g. if in the edg_wl_ui_gui_var.conf configuration file there is: requirements = other.GlueCEStateStatus == "Production" ;

and in the JDL file the user has specified: requirements = other.GlueCEInfoLRMSType == "PBS";

then the job description that is passed to the RB will contain requirements = (other.GlueCEInfoLRMSType == "PBS") && (other.GlueCEStateStatus == "Production");

Obviously the setting TRUE for the requirements in the configuration file does not have any impact on the evaluation of job requirements as it would result in: requirements = (other.GlueCEInfoLRMSType == "PBS") && TRUE ; This parameter is repeated in the configuration file once per each supported schema (see example below).

− rank this is an expression representing the default value for the rank expression in the JDL job description. The value of this parameter is assigned by the UI to the rank attribute in the JDL if not specified by the user. This parameter is mandatory. This parameter is repeated in the configuration file once per each supported schema (see example below).

− rankMPI this is an expression representing the default value for the rank expression for MPI jobs (JobType = “MPICH”) in the JDL job description. The value of this parameter is assigned by the UI to the rank attribute in the JDL if not specified by the user. This parameter is repeated in the configuration file once per each supported schema (see example below). If this parameter is not present in the configuration file then the GUI takes as default the expression specified for the rank parameter also for MPI jobs. This parameter is repeated in the configuration file once per each supported schema (see example below).

− RetryCount this is an integer representing the default value for the number of submission retries for a job upon failure due to some grid component (i.e. not to the job itself). The value of this parameter is assigned by the UI to the RetryCount attribute in the JDL if not specified by the user.

− ErrorStorage this is a string representing the path of the directory where the JDLEditor component creates parsing errors log files. This directory is not created by the GUI component, so it has to already exists on the machine. Default for this parameter is /tmp.

− LoggingDestination this is a string defining the address (<host>:[<port]) of the logging service (edg-wl-logd logging daemon ) to be targeted when logging events. The UI first check the environment for the EDG_WL_LOG_DESTINATION variable

IST-2000-25182 PUBLIC 56 / 146

Page 57: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

and only if this is not set, the value of the LoggingDestination parameter is taken into account.

− LoggingTimeout this is an integer representing the timeout in seconds for asynchronous logging function called by the UI when logging events to the LB. Recommended value for UI that are non-local to the logging service (edg-wl-logd logging daemon) is not less than 30 seconds.

− LoggingSyncTimeout this is an integer representing the timeout in seconds for synchronous logging function called by the UI when logging events to the LB. Recommended value is not less than 30 seconds.

− NSLoggerLevel this is an integer defining the quantity of information logged by the NS client. Possible values range from 0 to 6. 0 is the defaults and means that no information is logged. Default for this parameter is 0.

Hereafter is provided an example of the $EDG_WL_LOCATION/etc/edg_wl_ui_gui_var.conf configuration file. In the following example supported schemas are Glue and the old EDG one.

[

JDLEDefaultSchema = "Glue" ;

Glue = [

rank = - other.GlueCEStateEstimantedResponseTime ;

rankMPI = other.GlueCEStateFreeCPUs;

requirements = other.GlueCEStateStatus == "Production";

] ;

EDG = [

rank = - other.EstimatedTraversalTime ;

rankMPI = other.FreeCPUs;

requirements = other.Active;

] ;

RetryCount = 3 ;

ErrorStorage= "/tmp" ;

LoggingTimeout = 30 ;

LoggingSyncTimeout = 60 ;

LoggingDestination = "ibm139.cnaf.infn.it:9002" ;

NSLoggerLevel = 0;

]

Additional files installed in $EDG_WL_LOCATION/etc are the Information Service schema definition files:

− edg_wl_ui_jdle_<IS_schema>.xml (Glue and EDG are currently supported)

the condor dtd for parsing job description written in classad/xml format

IST-2000-25182 PUBLIC 57 / 146

Page 58: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

− condor.dtd

and lastly the Log4j properties file (see 5.7.1)

− edg_wl_ui_gui_log4j.properties

4.5.4. Environment variables Environment variables that have to be set for the User Interface are listed hereafter:

− X509_USER_KEY the user private key file path. Default value is $HOME/.globus/userkey.pem

− X509_USER_CERT the user certificate file path.Default value is $HOME/.globus/usercert.pem

− X509_CERT_DIR the trusted certificate directory and ca-signing-policy directory. Default value is /etc/grid-security/certificates

− X509_USER_PROXY the user proxy certificate file path. Default value is /tmp/x509up_u<UID> where UID is the user identifier on the machine as required by GSI.

These variables are used by the GSI layer to establish the security context. Moreover there are:

− GLOBUS_LOCATION The Globus rpms installation path. It defaults to /opt/globus

− EDG_WL_LOCATION The User Interface installation path. It has to be set only

if installation has been made in a non-standard location. It defaults to /opt/edg

− EDG_WL_LOG_DESTINATION address (<host>:[<port]) of the logging service (edg-wl-logd logging daemon ) to be targeted when logging events. This variable takes precedence with respect to the value set into the UI configuration. It defaults to localhost:9002.

− EDG_WL_LOG_TIMEOUT the timeout in seconds for asynchronous logging

function called by the UI when logging events to the LB. Recommended value for UI that are non-local to the logging service (edg-wl-logd logging daemon) is not less than 30 seconds. This variable takes precedence with respect to the value set into the UI configuration.

IST-2000-25182 PUBLIC 58 / 146

Page 59: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

− EDG_WL_LOG_SYNC_TIMEOUT this is an integer representing the timeout in

seconds for synchronous logging function called by the UI when logging events to the LB. Recommended value is not less than 30 seconds. This variable takes precedence with respect to the value set into the UI configuration.

4.5.4.1. Python Command Line Interface

− EDG_WL_UI_CONFIG_VAR Non-standard location of the command line interface configuration file edg_wl_ui_cmd_var.conf. This variable points to the file absolute path.

− EDG_WL_UI_CONFIG_VO Non-standard location of the vo-specific UI configuration

file edg_wl_ui.conf. This variable points to the file absolute path.

4.5.4.2. Java GUI

− EDG_WL_GUI_CONFIG_VAR Non-standard location of the command line interface configuration file edg_wl_ui_gui_var.conf. This variable points to the file absolute path.

− EDG_WL_GUI_CONFIG_VO Non-standard location of the vo-specific GUI

configuration file edg_wl_ui.conf. This variable points to the file absolute path. The GUI components also require setting of the following environment variables if the corresponding packages are not installed in the standard location (/usr/share/java)

− JAVA_INSTALL_PATH the installation path of Java 2 Development Kit − COG_INSTALL_PATH the installation path of the Globus Java CoG Kit − LOG4J_INSTALL_PATH the installation path the Log4J package − CLASSADJ_INSTALL_PATH installation path of the the customised Condor

classads java library

IST-2000-25182 PUBLIC 59 / 146

Page 60: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

5. OPERATING THE SYSTEM For security purposes all the WMS daemons run with proxy certificates. These certificates are generated from the start-up scripts that are described in the following section, before the applications are started. Lifetime of proxies created by the start-up scripts is 24 hours. In order to provide the daemons with valid proxies for all their lifetime the administrators need to ensure regular generation of new proxies. This can be achieved adding the following lines to the machine /etc/crontab: 57 2,8,14,20 * * * root service edg-wl-locallogger proxy 57 2,8,14,20 * * * root service edg-wl-lbserver proxy 57 2,8,14,20 * * * root service edg-wl-proxyrenewal proxy 57 2,8,14,20 * * * root service edg-wl-ns proxy This will make proxies be created by cron.

5.1. LB LOCAL-LOGGER

5.1.1. Starting and stopping daemons To run the LB local-logger services, it suffices to issue as root the following command:

/etc/rc.d/init.d/edg-wl-locallogger start

This makes both the edg-wl-logd and the edg-wl-interlogd processes start. The same can be done issuing the following commands: <install path>/sbin/edg-wl-logd <options>

<install path>/sbin/edg-wl-interlogd <options> Both daemons recognize a common set of options: --key=<proxyfile> despites the name, this should refer to the host proxy file (this

option overrides value of the environment variable X509_USER_KEY). Here below an example of option usage: --key=/tmp/hostproxy.pem

--cert=<certfile> despites the name, this should refer to the host proxy file (this

option overrides value of the environment variable X509_USER_CERT). Here below an example of option usage: --cert=/tmp/hostproxy.pem

IST-2000-25182 PUBLIC 60 / 146

Page 61: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

--CAdir=<certdir> trusted certificate and ca-signing-policy directory (this option

overrides value of the environment variable X509_CERT_DIR). Here below an example of option usage: --CAdir=/etc/grid-security/certificates

--file-prefix=<file path> Absolute path of the file where are stored locally the

logged events. The default value is /tmp/dglog, which can result in risk of data loss in case of reboot. Note that the same value must be specified for both daemons.

--socket=<local socket path> Unix socket used for direct communication between the daemons . --debug make the process run in foreground and produce

diagnostics --verbose be more verbose (makes sense with --debug only) --help display usage message and exit edg-wl-logd recognises the following specific option: --port=<port number> listen on a non-default port edg-wl-interlogd should be currently invoked with the --book option. It disables sending the events to persistent log storage, which is not yet supported. Using the options explicitly is recommended rather than relying on the correspondent environment variables. Stop of the LB local-logger services can be performed using the edg-wllocallogger script with the stop option.

5.1.2. Troubleshooting If the LB local-logger services are started in debug mode (i.e. using the –-debug option), the daemons log fatal failures with syslog().

IST-2000-25182 PUBLIC 61 / 146

Page 62: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

5.2. LB SERVER

5.2.1. Starting and stopping daemons To run the LB server services, it suffices to issue as root the following command:

/etc/rc.d/init.d/edg-wl-lbserver start

This makes the edg-wl-bkserverd processes start. The same can be done issuing the following commands:

<install path>/sbin/edg-wl-bkserverd <options> The daemon recognizes this set of options: --key=<keyfile> despites the name, this should refer to the host proxy file (this

option overrides value of the environment variable X509_USER_KEY). Here below an example of option usage: --key=/tmp/hostproxy.pem

--cert=<certfile> despites the name, this should refer to the host proxy file (this

option overrides value of the environment variable X509_USER_CERT). Here below an example of option usage: --cert=/tmp/hostproxy.pem

--CAdir=<certdir> trusted certificate and ca-signing-policy directory (this option

overrides value of the environment variable X509_CERT_DIR). Here below an example of option usage: --CAdir=/etc/grid-security/certificates

--debug make the process run in foreground to produce diagnostics --port=<port number> listen on a non-default port. Note that the release 2 server

listens also on <port number> + 1 for incoming events. --mysql=<database> connect to a non-default MySQL database. The database

string takes the form user/password@hostname:database.

IST-2000-25182 PUBLIC 62 / 146

Page 63: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

--slaves=<number> spawn that many slaves, meaning that this number of

client connections can be handled in parallel. --semaphores=<number> use that many semaphores for internal job locking.

Defaults to the number of slaves and should not be changed in normal operation.

--pidfile=<filename> use non-default PID & lock file. Using the options explicitly is recommended rather than relying on the correspondent environment variables. Stop of the LB server services can be performed using the edg-wl-lbserver script with the stop option.

5.2.2. Creating custom indices By default the LB server indexes data according to JobId only. Because the querying capabilities of LB release 2 were considerably extended, the server refuses to process a query which would not utilize any index – we prevent overloading the underlying database engine in this way. Consequently, even a trivial query “give me all my jobs” results in an error in the default setup – under certain conditions processing such query may require handling gigabytes of data. The server administrator can create and modify the set of indices and control the set of supported queries in the following way, using the edg-wl-bkindex utility. It is invoked in the following way: edg-wl-bkindex [options] [<index file>]

where the recognised options are: --mysql=<database> non-default database to connect to, same as for edg-wl-

bkserverd --verbose be verbose --dump dump the current settings to stdout --really really perform reindexing. Without this option the required

actions are reported but not actually done. The index file follows this syntax (subset of ClassAd syntax): index-file ::= [ JobIndices = { index-description * } ] index-description ::= column-description | list-of-columns IST-2000-25182 PUBLIC 63 / 146

Page 64: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

list-of-columns ::= { column-description + } column-description ::= [ column-type; column-name; prefix-len ? ] column-type ::= type = “user” | type = “system” column-name ::= name = “actual column name” prefix-len ::= prefixlen = integer The only top-level attribute JobIndices is a list (possibly empty) of index descriptions. Each index description is either a single column or a list of columns (where the order is important). The column is described by mandatory attributes type and name, and an optional attribute prefixlen. Possible values of type are “system” for LB internal attributes and “user” for user tags – arbitrary name=value pairs assigned to a job by the user. Currently supported system column names are “owner”, “destination” and “location”. Names of user tags are arbitrary as long as their length is less than 60 characters and they contain only ASCII printable characters excluding backtick (`). The prefixlen value may be used to restrict indexing of columns, which may grow rather long, to a fixed size. This becomes necessary with compound indices as MySQL limits the total size of index to 250 bytes only. The following example index file contains two indices, the first one on a single system attribute -- job owner, the second one composed from system attribute job destination and user tag called “experiment number”: [

JobIndices = {

[ type = "system"; name = "owner" ],

{

[ type = "system"; name = "destination";

prefixlen = 100 ],

[ type = "user"; name = "experiment number";

prefixlen = 100]

}

}

]

There is a sample configuration file, edg_wl_query_index.conf, containing definitions of indices on all the currently supported indexed system attributes, i.e. “owner”, “destination”, and “location”.

The edg-wl-bkindex should be run with the --really option with the LB server shut down. Depending on actual size of the database the reindexing may take rather long time. LB server becomes aware of the new index setup automatically on its startup.

IST-2000-25182 PUBLIC 64 / 146

Page 65: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

5.2.3. Purging the LB database The edg-wl-bkpurge process, whose executable is installed in <install path>/sbin, is not a daemon but an utility which should be run periodically (e.g. using a cron job) in order to remove inactive jobs (i.e. those that have already entered the Cleared status since a certain amount of time) from the LB database. This utility recognizes the following set of options: --log data being purged from database are dumped to the

stdout --outfile=<file> data being purged from database are dumped in the file

named <file> --mysql=<database> name of the database to be purged. It must be the same

used by edg_wl_bkserverd (this option is not required in the standard set-up

--timeout=<timeout>[smhd] removes data for all jobs that entered the “Cleared” status since more than <timeout> [seconds/minutes/hours/days].

--debug print diagnostics on the stderr --nopurge dry run mode. It doesn't really purge (useful for

debugging purposes) --aborted, -a delete from the database data also for jobs that have

entered the “Aborted” status If --log is specified, the data in ULM format are dumped to stdout (or <file>). Normally information is appended to the file. The file is locked with flock (_LOCK_EX) to prevent race conditions, e.g. rotating logs. An example use of this utility is the following cron line to delete all data older than 14 days from the database: edg-wl-bkpurge --log --outfile=/var/log/dglb-data.log --timeout=14d

In general, the edg-wl-bkpurge utility may generate rather high background load on the database engine. Therefore it should not be run too frequently (once a day is appropriate), and preferably at the time of low LB server activity.

5.2.4. Experimental R-GMA Interface The LB server release 2 is capable of feeding the R-GMA infrastructure with notifications on job state changes. The functionality is enabled by starting edg-wl-bkserverd with the option –-rgmaexport. In addition, the environment variables EDG_WL_RGMA_FILE and EDG_WL_RGMA_SOCK has to be set to point to a file and local UNIX socket name used for communication with the R-GMA producer. The producer itself is the Java program LBProducer. It has to be invoked with the two environment variables set to the same values. It takes no further arguments.

IST-2000-25182 PUBLIC 65 / 146

Page 66: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

5.2.5. Troubleshooting If the LB server services are started in debug mode (that is using the –-debug option) the daemons log fatal failures with syslog().

5.3. SERVICES RUNNING IN THE “RB NODE”: NS, WM, JC, LM

5.3.1. Starting and stopping NS, WM, JC and LM daemons Startup of NS, WM, JC and LM can be achieved issuing: /etc/rc.d/init.d/edg-wl-ns start

/etc/rc.d/init.d/edg-wl-wm start

/etc/rc.d/init.d/edg-wl-jc start

/etc/rc.d/init.d/edg-wl-lm start

In the same way stopping is achieved by: /etc/rc.d/init.d/edg-wl-ns stop

/etc/rc.d/init.d/edg-wl-wm stop

/etc/rc.d/init.d/edg-wl-jc stop

/etc/rc.d/init.d/edg-wl-lm stop

The startup script for JC also starts and stops the underlying CondorG service. These scripts will start the daemons with the correct selected users. Startup scripts can also be used to know the current status of the daemons using the status option.

Moreover it is strongly recommended to set the configuration of the machine in such a way that all these services will be started at the startup of the system.

5.3.2. NS, WM, JC, LM troubleshooting The NS, WM, LC and LM services supply with log files recording their various events. These files can be used to debug abnormal behaviors of these services. The log-file names and the level of debugging (i.e. the level of “detail”) can be changed by directly modifying the configuration file.

5.4. PROXY RENEWAL

5.4.1. Starting and stopping daemon To run the PR daemon, it suffices to issue as root the following command:

IST-2000-25182 PUBLIC 66 / 146

Page 67: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

/etc/rc.d/init.d/ edg-wl-proxyrenewal start

This makes the edg-wl-renewald process start. The same can be done issuing the following command: The daemon recognizes the following set of options: --debug make the process run in foreground and produce diagnostics --repository=<dir> directory where registered proxies will be stored (/var/spool/edg-wl-renewd by default)

5.4.2. Troubleshooting If the PR service is started in debug mode (i.e. using the --debug option), the daemon prints out fatal failures to stdout.

5.5. PURGER The input/output sandbox directory for a given job are cleared in the “RB node” when the job retrieves the output sandbox files (with the command edg-job-get-output command), or when the job is declared as aborted. To avoid that the file system of the “RB node” get fully used, the system administrator can run the Storage Purger daemon, which is in charge to clean old input-output sandboxes, according to a policy that has to be specified The edg-wl-purgeStorage executable (the storage purger) accepts the following options:

-a=<argument>

--allocated-limit=<argument> Defines a percentage of used space in the input/output sandbox disk, if the used space is more than the specified argument, then the purging is triggered.

-b --brute-rm

If enabled, ALL the input-output sandboxes directories are removed (use this option with care).

-e

--enable-progress Enables the progress indicator bar

IST-2000-25182 PUBLIC 67 / 146

Page 68: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

. -f

--fake-rm Does not perform any directory removal. -l=<argument> --log-file=<argument>

Logs the purge information into the specified file. If not specified, the default file is $EDG_WL_TMP/edg-wl-purgeStorage-<date>- <time>.log -p=<argument>

--staging-path=<argument> Defines the sandbox staging path (should be the same value specified as SandboxStagingPath in the NS configuration section (see section 4.2.2.2). If not specified the directory referenced by the $EDG_WL_TMP environment variable is considered.

-q

--quiet

Does not create any log file (any settings specified with the option -l will be ignored). -t=<argument>

--threshold=<argument>

The purge storage deletes the sandboxes directories for jobs which have been in DONE status or ABORTED status for at least <argument> seconds (while the sandboxes directories will be cleared for all jobs in CLEARED status). If not specified the default value is 604800 (one week)

The storage purger should be regularly invoked, for example via a cron job. These are two examples of cron rules:

edg-wl-purgeStorage-weekly.cron

# Execute the "purger" command at 4:00 AM, 8:00 AM, 12:00 noon, # 4:00 PM, and 8:00 PM (0 */4) on each Sunday (sun).

0 */4 * * sun $EDG_WL_LOCATION/sbin/edg-wl-purgeStorage -l

$EDG_WL_LOCATION_VAR/log/edg-wl-purgeStorage.log -t 604800

IST-2000-25182 PUBLIC 68 / 146

edg-wl-purgeStorage-hourly.cron

Page 69: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

# Execute the "purger" command at every day except on Sunday

# with a frequency of one hour if and only if the percentage of # used space is greater than 40%

0 */1 * * mon-sat $EDG_WL_LOCATION/sbin/edg-wl-purgeStorage -l

$EDG_WL_LOCATION_VAR/log/edg-wl-purgeStorage.log -t 604800 -a 40

5.6. GRID ACCOUNTING

5.6.1. Starting and stopping daemon

5.6.1.1. HLR server To run the HLR server daemon, it suffices to issue as root the following command:

/etc/rc.d/init.d/edg-wl-hlrd start This makes the edg-wl-dgas-hlrd process start. The daemon recognizes the following set of options: --help

-h

Print an informative help message describing the options and then exit. --conf -c

Specifies the full path to the configuration file to be used (see section 4.4.2.1). --port

-p Specifies listening port number --log -l

Specifies the full path for the log file

5.6.1.2. PA Server To run the PA server daemon, it suffices to issue as root the following command:

IST-2000-25182 PUBLIC 69 / 146

Page 70: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

/etc/rc.d/init.d/edg-wl-pad start This makes the edg-wl-dgas-pad process start. The daemon recognizes the following set of options: --help

-h

Print an informative help message describing the options and then exit. --conf -c

Specifies the full path to the configuration file to be used (see section 4.4.2.2). --port

-p Specifies listening port number --log -l

Specifies the full path for the log file

5.6.2. HLR server administration To use the DGAS software, it is necessary first of all to create the accounts for users and resources. Users and resources are divided into groups. These groups can be used to collect statistics about the expenses/earnings of a given subset of the users/resources of the HLR. As example, let’s suppose that users Ua, Ub, Uc, Ud and the resources Ra, Rb belong to the group Ga. When a user (e.g. Ub) spends some credits, his group is debited of the same amount. When a resource (e.g. Ra) earns some credits, the corresponding group is credited of the same amount. Funds are containers of groups. They can be used for example to divide users or resources belonging to different VOs on HLRs used to manage multiple VOs, or to achieve a better granularity on large HLRs.

IST-2000-25182 PUBLIC 70 / 146

Page 71: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Actually up to now this division into groups and funds has limited advantages, since it only affects the way earnings/expenses are computed, but in future releases of the software it will allow setting different priorities to the users. The steps needed to create the accounts are:

• Creating the fund accounts • Creating the group accounts • Creating the user/resource accounts

5.6.2.1. Creating a Fund account The command to use is: edg-wl-dgas-hlr-addFund [OPTIONS]

-i --interactive

-C --Conf <Configuration file name>

-f --fid "fid"

-F --Force "Force the record insertion"

-d --descr "description"

-t --total "total funds"

where: -C is used to specify the conf file needed by the command to point to the HLR database -f is used to specify a fund identifier (fid) that will be used to address the fund -d is used to specify a human readable reminder of what this fund is -t is used to assign credits to the fund. You can use 0 as a default value For example, in order to create the fund VO_2, a command such as: edg-wl-dgas-hlr-addFund -C "/opt/edg/etc/edg-wl-dgas-hlr.conf" \

-f VO_2 -d "Virtual Organization 2 account" -t 0

should be issued. To check if a fund has been correctly inserted, the following command can be used:

IST-2000-25182 PUBLIC 71 / 146

Page 72: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

edg-wl-dgas-hlr-queryFund -C ../etc/edg-wl-dgas-hlr.conf

which should get an output like: |VO_2|Virtual Organization 2 account|0|0|

5.6.2.2. Creating a Group account The command to be used is: edg-wl-dgas-hlr-addGroup [OPTIONS]

-i --interactive

-F --Force

-C --Conf Configuration file name

-g --gid "gid"

-d --descr "description"

-f --fid "fid"

-t --total "total funds"

-b --booked "booked funds" shouldn't be specified manually

-s --spent "spent funds" shouldn't be specified manually

where: -C specifies the configuration file. -g specifies a group identifier (gid) used to address this group -d is a reminder of what the group is -f is used to specify the fid of the fund to link this group with -t is the amount of credits assigned to the group, you can use 0 For example, in order to create the group Group3, it is necessary to issue a command like: edg-wl-dgas-hlr-addGroup -C "/opt/edg/etc/edg-wl-dgas-hlr.conf" \

-g Group3 -d "Users and resources of VO_2" -f VO_2 -t 0

To check if a Group has been correctly inserted, the following command can be used:

IST-2000-25182 PUBLIC 72 / 146

Page 73: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

./edg-wl-dgas-hlr-queryGroup -C ../etc/edg-wl-dgas-hlr.conf

which should return an output like: |Group3|Users and resources of VO_2|VO_2|0|0|0|

5.6.2.3. Creating a User account The command to be used to create a user account is: edg-wl-dgas-hlr-addUser [OPTIONS]

-i --interactive

-C --Conf Configuration file name

-F --Force Force record insertion

-u --uid "uid"

-e --email "email"

-d --descr "description"

-c --cert "cert subject"

-g --gid "gid"

-f --fid "fid"

-a --assigned "assigned funds"

-b --booked "booked funds" shouldn't be specified manually

-s --spent "spent funds" shouldn't be specified manually

where: -C specifies the configuration file. -u specifies an identifier for the user (uid) (nothing to do with the Unix uid !) -d A reminder of who the user is e.g. his real name -c The User X509 cert subject -g gid of the user Group. -f fid of the user fund. -a amount of credits assigned to the user. Use 0. For example, to create the user Ua, it is necessary to issue a command like: edg-wl-dgas-hlr-addUser -C "/opt/edg/etc/edg-wl-dgas-hlr.conf" \

IST-2000-25182 PUBLIC 73 / 146

Page 74: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

-u Ua -e user.a@userdomain -d "UserA desc" \

-c "UserCertSubject" -g Group3 -f VO_2 -a 0

To check if a User has been correctly inserted, the command: edg-wl-dgas-hlr-queryUser -C "/opt/edg/etc/edg-wl-dgas-hlr.conf" -U

can be used. If everything is fine, the output should look like: |Ua|user.a@userdomain|UserA desc|UserCertSubject|Group3|VO_2|0|0|0|

5.6.2.4. Creating a Resource account The command used to create a resource account is: edg-wl-dgas-hlr-addResource [OPTIONS]

-i --interactive

-F --Force

-C --Conf Configuration file name

-r --rid "rid"

-e --email "email for contact person"

-d --descr "description"

-c --cert "cert subject"

-g --gid "gid"

-f --fid "fid"

where: -C specifies the configuration file. -r specifies an identifier for the user (rid) used to address the resource -e specifies an email address of a contact person for that resource -d specifies a description of the resource -c the CeID of the assigned to this account -g the gid of the resource group -f the fid of the resource fund

IST-2000-25182 PUBLIC 74 / 146

Page 75: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

For example to create the resource Ra, it is necessary to issue a command like: edg-wl-dgas-hlr-addResource -C "/opt/edg/etc/edg-wl-dgas-hlr.conf" \

-r Ra -e resource.administrator@domain -d "Res desc" \

-c "CeID" -g Group3 -f VO_2 To check if a Resource has been correctly inserted, the following command can be used: edg-wl-dgas-hlr-queryResource -C "/opt/edg/etc/edg-wl-dgas-hlr.conf" -R

The output should be something like: |Ra|resource.administrator@domain|Res desc|CeID|Group3|VO_2|0|

5.6.2.5. Deleting accounts The commands:

edg-wl-dgas-hlr-delFund

edg-wl-dgas-hlr-delGroup

edg-wl-dgas-hlr-delResource

edg-wl-dgas-hlr-delUser

can be used to delete respectively fund, group, resource, user accounts.

5.6.3. Troubleshooting Both the HLR and the PA server supply with log files, ad described in the previous sections. These files can be used to debug abnormal behaviors of the services.

5.7. USER INTERFACE (JAVA GUI) As already mentioned in section 4.5.3.2 the Java GUI encompasses three components that are:

− JobSubmitter − JobMonitor − JDLEditor

IST-2000-25182 PUBLIC 75 / 146

Page 76: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

To start the these components, it suffices to issue respectively the following commands:

$EDG_WL_LOCATION/sbin/edg-wl-ui-jobsubmitter.{sh,csh}

$EDG_WL_LOCATION/sbin/edg-wl-ui-jobmonitor.{sh,csh}

$EDG_WL_LOCATION/sbin/edg-wl-ui-jdleditor.{sh,csh}

where the scripts listed above are installed by the edg-wl-ui-gui-X.Y.Z-K.i486.rpm rpm. It is important to note that from the JobSubmitter it is possible to start the other GUI components. With the exception of the JDLEditor that does not interact with any external WMS module, before starting the GUI components, the X509_USER_KEY, X509_USER_CERT, X509_USER_PROXY environment variables have to be set if user credentials are not stored in the default locations. It is worth recalling that the GUI only needs a valid proxy certificate for working correctly. Lastly the environment variables JAVA_INSTALL_PATH, COG_INSTALL_PATH, LOG4J_INSTALL_PATH and CLASSADJ_INSTALL_PATH that are used by the scripts to set the java CLASSPATH need to be set if the corresponding packages are not installed in the standard location (/usr/share/java).

5.7.1. Troubleshooting The GUI supplies with log files recording its various events. These files can be used to debug abnormal behaviors of the three components. The log-file names and the level of debugging (i.e. the level of “detail”) can be changed by directly modifying the log4j configuration file $EDG_WL_LOCATION/etc/ edg_wl_ui_gui_log4j.properties. The configuration files is written in Java properties format (e.g. <attribute name>=<attribute value>) as for the examples reported below. log4j allows to print logging information to different multiple destinations or appenders. The most important kinds of appenders available are: console, files, GUI components and remote socket servers. It is possible to log information in synchronous or asynchronous manner. When using an appender it is important to define and associate a layout with it. The layout is the way the logging information is formatted during the logging request. The PatternLayout allows the user to specify the output format according to his preferences or criteria. Here is an example of configuration file:

# Setting root level

(1) log4j.rootLogger=DEBUG, appender1

# Setting appender1 as console appender

(2) log4j.appender.appender1=org.apache.log4j.ConsoleAppender

# Setting PatternLayout for appender1

IST-2000-25182 PUBLIC 76 / 146

Page 77: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

(3) log4j.appender.appender1.layout=org.apache.log4j.PatternLayout

(4) log4j.appender.appender1.layout.ConversionPattern=%-4r [%t] %-5p %c - %m%n

The line (1) is used to set the logging level and the appender used to log information. In this example the level is DEBUG indicating that all logging requests are enabled i.e. all logging information will be wrote to the appender. It is possible to change this value choosing from one of the available logger levels (DEBUG, INFO, WARN, ERROR and FATAL). If the logging level is set to ERROR then only ERROR and FATAL logging requests will be activated. The appender used to log is set to appender1. Line (2) defines appender1 as a ConsoleAppender to write log information to the user console. Lines (3) and (4) are necessary to define the layout used to write log information to the appender. The first line set the layout as a PatternLayout and the second one is used to describe the pattern (ConversionPattern) to use. The following configuration file defines two appenders: a console appender and a file appender to write logging information to a file.

# Setting root level

(1) log4j.rootLogger=ERROR, appender1, appender2

# Setting appender1 as console appender

(2) log4j.appender.appender1=org.apache.log4j.ConsoleAppender

# Setting PatternLayout for appender1

(3) log4j.appender.appender1.layout=org.apache.log4j.PatternLayout

(4) log4j.appender.appender1.layout.ConversionPattern=%-4r [%t] %- 5p %c - %m%n

# Setting appender2 as external file appender

(5) log4j.appender.appender2=org.apache.log4j.RollingFileAppender

(6) log4j.appender.appender2.File=example.log

(7) log4j.appender.appender2.MaxFileSize=200KB

(8) log4j.appender.appender2.MaxBackupIndex=1

# Setting PatternLayout for appender2

(9) log4j.appender.appender2.layout=org.apache.log4j.PatternLayout

(10) log4j.appender.appender2.layout.ConversionPattern=%-4r [%t] %-5p %c - %m%n

Lines from (1) to (4) define the appender1 as seen in the first example. Lines from (5) to (10) set the appender2 as a file appender. More precisely: line (5) sets the appender as a rolling file (first inserted lines will be lost when the maximum size will be reached); line (6) sets the name of the file appender; line (7) sets the maximum file size in Kbytes; line (8) says to use a second file to store information when roll-over occurs (the file will be named example.log.1). Lines (9) and (10) define the pattern layout in the same way as done for appender1.

IST-2000-25182 PUBLIC 77 / 146

Page 78: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Here are the meaning of some fields used to define the pattern layout: %r number of milliseconds elapsed since the start of the application %t thread making the log request %p level of the log statement %c name of the logger associated with the log request %m message of the statement %n new line feed %d date and time %F java file name %L java file line number Here below is reported the log4j configuration file for the GUI that is installed by the rpm edg-wl-ui-gui-X.Y.Z-K_gcc3_2_2.i486.rpm. As can be seen it sets the appender as a rolling file (/var/tmp/edg_wl_ui_gui_log4j.log) and the logging level to FATAL. Some alternatives are provided (commented) in the file; the user can anyway customize this file according to her/his needs. # Setting root level

# log4j.rootLogger=loggerLevel, appenderList

# Possible values for loggerLevel are: DEBUG, INFO, WARN, ERROR, FATAL

# appenderList is a list of appenders separated by a comma

log4j.rootLogger=FATAL, myAppender

# Setting myAppender as ConsoleAppender

# log4j.appender.myAppender=org.apache.log4j.ConsoleAppender

# Setting myAppender as external file appender

# log4j.appender.myAppender=org.apache.log4j.FileAppender

# log4j.appender.myAppender.File=/var/tmp/edg_wl_ui_gui_log4j.log

# Setting myAppender as external rolling file appender

log4j.appender.myAppender=org.apache.log4j.RollingFileAppender

log4j.appender.myAppender.File=/var/tmp/edg_wl_ui_gui_log4j.log

log4j.appender.myAppender.MaxFileSize=500KB

log4j.appender.myAppender.MaxBackupIndex=1

# Setting PatternLayout for myAppender

log4j.appender.myAppender.layout=org.apache.log4j.PatternLayout

# Log4j basic configurator conversion pattern

IST-2000-25182 PUBLIC 78 / 146

Page 79: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

# log4j.appender.myAppender.layout.ConversionPattern=%-4r [%t] %-5p %c -

%m%n

# Use this conversion pattern to show java file name and line number

# log4j.appender.myAppender.layout.ConversionPattern=%d [%t] %-5p %c

(%F:%L) %n \t %m%n

log4j.appender.myAppender.layout.ConversionPattern=%d [%t] %-5p %c %n \t %m%n

IST-2000-25182 PUBLIC 79 / 146

Page 80: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

6. USER GUIDE The software module of the WMS allowing the user to access main services made available by the components of the scheduling sub-layer is the User Interface that hence represents the entry-point to the whole system. Sections 6.1.1 and 6.1.2 provide a general description of the UI, dealing with the security management, common behaviours, environment variables to be set etc. Section 6.1.3 describes the Job Submission User Interface commands in a Unix man-page style.

6.1. USER INTERFACE The Job Submission UI is the module of the WMS allowing the user to access main services made available by the components of the scheduling sub-layer. The user interaction with the system is assured by means of a JDL and a command-driven user interface providing commands to perform a certain set of basic operations. Main operations made possible by the UI are:

- Submit a job for execution on a remote Computing Element, also encompassing: automatic resource discovery and selection staging of the application sandbox (input sandbox)

- Find the list of resources suitable to run a specific job - Cancel one or more submitted jobs - Retrieve the output files of a completed job (output sandbox) - Retrieve and display bookkeeping information about submitted jobs - Retrieve and display logging information about submitted jobs. - Retrieve checkpoint states of a submitted checkpointable job. - Start a local listener for an interactive job.

The User Interface depends on two other Workload Management System components:

- the Network Server that provides support for the job control functionality - the Logging and Bookkeeping Service that provides support for the job monitoring

functionality.

6.1.1. Security For the DataGrid to be an effective framework for largely distributed computation, users, user processes and grid services must work in a secure environment. Due to this, all interactions between WMS components, especially those that are network-separated, will be mutually authenticated: depending on the specific interaction, an entity authenticates itself to the other peer using either its own credential or a delegated user credential or both. For example when the User Interface passes a job to the Network Server, the UI authenticates using a delegated user credential (a proxy certificate) whereas the NS uses its own service credential. The same happens when the UI interacts with the Logging and Bookkeeping service. The UI uses a delegated user credential to limit the risk of compromising the original credential in the hands of the user. The user or service identity and their public key are included in a X.509 certificate signed by a EDG trusted Certification Authority (CA), whose purpose is to guarantee the association between that public key and its owner.

IST-2000-25182 PUBLIC 80 / 146

Page 81: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

According to what just premised, to take advantage of UI commands the user has to possess a valid X.509 certificate on the submitting machine, consisting of two files: the certificate file and the private key file. The location of the two mentioned files is assumed to be either pointed to respectively by “$X509_USER_CERT” and “$X509_USER_KEY” or by “$HOME/.globus/usercert.pem” and “$HOME/.globus/userkey.pem” if the X509 environment variables are not set. Actually the user certificate and private key files are not mandatory on the UI machine; indeed they are only needed for the creation of the delegated user credentials through the grid-proxy-init or edg-voms-proxy-init commands. It is for example possible to download the proxy credentials from a trusted site and work with it without having the cert and key available locally. What is really needed is the user proxy credentials: all UI commands, when started, check for the existence and expiration date of a user proxy credentials in the location pointed to by “$X509_USER_PROXY” or in “/tmp/x509up_u<UID>” (<UID> is the user identifier in the submitting machine OS) if the X509 environment variable is not set. If the proxy certificate does not exist or has expired the UI returns an error message to the user and exits. Once a job has been submitted by the UI, it passes through several components of the WMS (e.g. the NS, the WM, the JC, CondorG etc.) before it completes its execution. At each step operations that are related with the job could require authentication by a certificate. For example during the scheduling phase, the RB needs to get some information about the user who wants to schedule a job and the certificate of the user could be needed to access this information. Similarly, a valid user’s certificate is needed by JC/CondorG to submit a job to the CE. Moreover JC has to be able to repeat this process e.g. in case of crashing of the CE which the job is running on, therefore, a valid user’s certificate is needed for all the job lifetime. A job gets a valid proxy certificate when it is submitted by the UI to NS. Validity of such a certificate is usually set to 12 hours, hence problems could occur if the job spends on CE (in a queue or running) more time than lifetime of its proxy certificate. In order to submit long-running jobs, users can either generate proxy credentials using the respectively the --valid and --hours of the grid-proxy-init and edg-voms-proxy-init commands or (more safely) rely on the features of the MyProxy package, as introduced in section 4.3. The underlying idea is that the user registers in a MyProxy server a valid long-term certificate proxy that will be used by the WMS to perform a periodic credential renewal for the submitted job; in this way the user is no longer obliged to create very long lifetime proxies when submitting jobs lasting for a great amount of time. A more detailed description of this mechanism is provided in the following paragraph.

6.1.1.1. MyProxy The MyProxy credential repository system consists of a server and a set of client tools that can be used to delegate and retrieve credentials to and from a server. Normally, a user would start by using the myproxy_init client program along with the permanent credentials necessary to contact the server and delegate a set of proxy credentials to the server along with authentication information and retrieval restrictions.

6.1.1.1.1. MyProxyClient

IST-2000-25182 PUBLIC 81 / 146

Page 82: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

The set of binaries provided for the client is made of the following files:

myproxy-init myproxy-info myproxy-destroy myproxy-get-delegation

myproxy-init command allows you to create and send a delegated proxy to a myproxy server for later retrieval; in order to launch it you have to assure you're able to execute the grid-proxy-init GLOBUS command (i.e.t he binary is visible from your $PATH environment and the required cert files are either stored in the common path or specified with the X509 variables). You can use the command as follows (you will be asked for your PEM passhprase): myproxy-init -s <host name> -t <hours> -d –n

The myproxy-init command stores a user proxy in the repository specified by <host name> (the –s option). Default lifetime of proxies retrieved from the repository will be set to <hours> (see -t) and no password authorization is permitted when fetching the proxy from the repository (the -n option). The proxy is stored under the same username as is your subject in your certificate (-d). The myproxy-info command returns the remaining lifetime of the proxy in the repository along with subject name of the proxy owner (in our case it will be the same as in your proxy certificate). So if you want to get information about the stored proxies you can issue: myproxy-info -s <host name> -d

where -s and -d options have already been explained in the myproy-init command The myproxy-destroy command simply destroys any existing proxy stored in the myproxy server. You can use it as follows: myproxy-destroy -s <host name> -d

where -s and -d options have already been explained in the myproy-init command The myproxy-get-delegation command is indeed used to retrieve information about the proxies stored in the myproxy server. You can use it as follows: myproxy-get-delegation -s <host name> -d -t <hours> \

-o <output file> -a <user proxy>

You should end up with a retrieved proxy in <output file>, which is valid for

IST-2000-25182 PUBLIC 82 / 146

Page 83: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

<hours> hours. It is worth noting that the environment variable MYPROXY_SERVER can be set to tell to all these programs the hostname where the myproxy server is running.

6.1.2. Common behaviours A User Interface installation mainly consists of four directories bin, lib, etc and share that are created under the UI installation path that is usually pointed by the EDG_WL_LOCATION environment variable. If this variable is not set or its value is not correct, default value is assumed to be “/opt/edg”. bin contains the commands executables and hence it is recommended to add it to the user PATH environment variable to allow her/him to use UI commands from whatever location. lib contains the shared libraries (wrappers of the NS/LB APIs) implementing functionalities for accessing the NS and LB services. Moreover lib contains a subdirectory named python containing some python modules also needed for accessing the underlying services. , etc is the UI configuration area: it contains the file containing the mapping between error codes and error messages (edg_wl_ui_cmd_err.conf), the file containing the detailed description of each command (edg_wl_ui_cmd_help.conf) and the actual configuration file (edg_wl_ui_cmd_var.conf). The latter file is the only one that could need to be edited and tailored according to the user/platform characteristics and needs. It contains the following information that are read by and have influence on commands behaviour (see section 4.5.3 for details):

- default location of the local storage areas for the Output sandbox files, - default location for the UI log files, - default values for the JDL mandatory attributes, - default values for timeouts when logging events to the LB, - default logging destination, - user’s default VO, - default level of information displayed by the monitoring commands

Inside etc there is a directory for each supported EDG Virtual Organisation and named as the VO (e.g. for atlas we will have etc/atlas/) that contains a vo-specific configuration file edg_wl_ui.conf specifying the list of Network Servers and LBs accessible for the given VO. When started, UI commands first check if the EDG_WL_LOCATION is set and then search for the etc directory containing its configuration files in the following locations, in order of precedence: “$EDG_WL_LOCATION”, “/opt/edg“, “/“, “/usr/local“. If none of the locations contains needed files an error is returned to the user. Since several users on the same machine can use a single installation of the UI, people concurrently issuing UI commands share the same configuration files. Anyway for users (or groups of users) having particular needs it is possible to “customise” the UI configuration through the --config and –config-vo options supported by each UI command.

IST-2000-25182 PUBLIC 83 / 146

Page 84: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Indeed every command launched specifying “--config file_path” reads its configuration settings in the file pointed by “file_path” instead of the default configuration file. The same happens for the vo-specific configuration file if the command is started using specifying “–config-vo vo_file_path”. Hence the user only needs to create such file according to her/his needs and to use the appropriate options to work under “private” settings. Moreover if the user wants to make this change in some way permanent avoiding the use for each issued command of the --config option, she/he can set the environment variable EDG_WL_UI_CONFIG_VAR to point to the non-standard path of the configuration file. Indeed if that variable is set commands will read settings from file “$EDG_WL_UI_CONFIG_VAR”. Anyway the --config option takes precedence on all other settings. Exactly the same applies to the EDG_WL_UI_CONFIG_VO environment variable and the --config-vo option. It is important to note that since the job identifiers edg_jobId (see section 6.1.3 – edg-job-submit) implicitly holds the information about the LB that is managing the corresponding job, all the commands taking the edg_jobId as input parameter do not take into account the LB addresses listed in the configuration file to perform the requested operation also if the –config-vo option has been specified. Hereafter are listed the options that are common to all UI commands:

--config file_path

--noint

--debug

--logfile file_path

--version

--help

The --noint option skips all interactive questions to the user and goes ahead in the command execution. All warning messages and errors (if any) are written to the file <command_name>_<UID>_<PID>_<date_time>.log in the location specified in the configuration file instead of the standard output. It is important to note that when --noint is specified some checks on “dangerous actions” are skipped. For example if jobs cancellation is requested with this option, this action will be performed without requiring any confirmation to the user. The same applies if the command output will overwrite an existing file, so it is recommended to use the --noint option in a safe context. The --debug option is mainly thought for testing and debugging purposes; indeed it makes the commands print additional information while running. Every time an external API function call is encountered during the command execution, values of parameters passed to the API are printed to the user. The info messages are displayed on the standard output and are also written together with possible errors and warnings, to <command_name>_<UID>_<PID>_<date_time>.log. If --noint option is specified together with --debug option the debug message will not be printed on standard output. The –logfile <file_path> option allows re-location of the commands log files in the location pointed by file_path.

IST-2000-25182 PUBLIC 84 / 146

Page 85: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

The --version and --help options respectively make the commands display the UI current version and the command usage. Two further options that are common to almost all commands are --input and --output. The latter one makes the commands redirect the outcome to the file specified as option argument whilst the former reads a list of input items from the file given as option argument. The only exception is the edg-job-list-match command that does not have the --input option.

6.1.2.1. The --input option For all commands, the file given as argument to the --input option shall contain a list of job identifiers in the following format: one edg_jobId for each line, comments beginning with a “#” or a “*” character. If the input file contains only one edg_jobId (see the description of dg-job-submit command later in this document for details about edg_jobId format), then the request is directly submitted taking the edg_jobId as input, otherwise a menu is displayed to the user listing all the contained items, i.e. something like: ---------------------------------------------------------------------------

1 : https://ibm139.cnaf.infn.it:9000/ZU9yOC7AP7AOEhMAHirG3w

2 : https://ibm139.cnaf.infn.it:9000/ZU9yOC767gJOEhMAHirG3w

3 : https://ibm135.cnaf.infn.it:9000/ZU9yOC7AP7A55TREAHirG3w

4 : https://grid012f.cnaf.infn.it:7846/ZUHY6707AP7AOEhMAHirG3w

5 : https://grid012f.cnaf.infn.it:9000/Cde341P7AOEhMAHirG3w

6 : https://ibm139.cnaf.infn.it:9000/BgT8T6H_L92FsKq3OeTWOw

7 : https://ibm139.cnaf.infn.it:9000/lYlPBQez7fiXx9qq7BEdyw

8 : https://ibm139.cnaf.infn.it:9000/_f0Bm_s6UdFPZIEjSglipg

a : all

q : quit

---------------------------------------------------------------------------

Choose one or more edg_jobId(s) in the list - [1-10]all:

The user can choose one or more jobs from the list entering the corresponding numbers. Single jobs can be selected specifying the numbers associated to the job identifiers separated by commas. Ranges can also be selected specifying ends separated by a dash and it is worth mentioning that it is possible to select at the same time ranges and single jobs. E.g.:

− 2 makes the command take the second listed edg_jobId as input − 1,4 makes the command take the first and the fourth listed edg_jobIds as input − 2-5 makes the command take listed edg_jobIds from 2 to 5 (ends included) as

input − 1,3-5,8 selects the first job id in the list, the ids from the third to the fifth (ends

included) and finally the eighth one. − all makes the command take all listed edg_jobIds as input − q makes the command quit

IST-2000-25182 PUBLIC 85 / 146

Default value for the choice is all. If the –input option is used together with the --noint then all edg_jobIds contained in the input file are taken into account by the command.

Page 86: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

There are some commands whose --input behaviour differs from the one just described. One of them is edg-job-submit. First of all the input file contains in this case CEIds instead of edg_jobIds, moreover only one CE at a time can be the target of a submission hence the user is allowed to choose one and only one CEId. Default value for the choice is “1”, i.e. the first CEId in the list. This also the choice automatically made by the command when the --input option is used together with the --noint one. The other commands are edg-job-attach and edg-job-get-chkpt whose --input option allows to select one (just one) of the edg_jobIds contained in the input file.

IST-2000-25182 PUBLIC 86 / 146

Page 87: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

6.1.3. Commands description In this section we describe syntax and behavior of the commands made available by the UI to allow job submission, monitoring and control. In the commands synopsis the mandatory arguments are showed between angle brackets (<arg>) whilst the optional ones between square brackets ([arg]).

6.1.3.1. edg-job-submit Allows the user to submit a job for execution on remote resources in a grid. SYNOPSIS edg-job-submit [options] <jdl_file>

Options:

--help

--version

--vo <vo_name>

--input, -i <file_path>

--resource, -r <ce_id>

--chkpt <file_path>

--nolisten

--nogui

--nomsg

--config, -c <file_path>

--config-vo <file_path>

--output, -o <file_path>

--noint

--debug

--logfile <file_path>

DESCRIPTION edg-job-submit is the command for submitting jobs to the DataGrid and hence allows the user to run a job at one or several remote resources. edg-job-submit requires as input a job description file in which job characteristics and requirements are expressed by means of Condor class-ad-like expressions. While it does not matter the order of the other arguments, the job description file has to be the last argument of this command. The job description file given in input to this command is syntactically checked and default values are assigned to some of the not provided mandatory attributes in order to create a

IST-2000-25182 PUBLIC 87 / 146

Page 88: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

meaningful class-ad. The resulting job-ad is sent to the NS, which then forwards it to the WM, which via the RB/Matchmaker finds the job best matching resource (match-making) and then the JC submits the job to it. The match-making algorithm is described in details in Annex 7.7. Upon successful completion this command returns to the user the submitted job identifier edg_jobId (a string that identifies unambiguously the job in the whole EDG), generated by the User Interface, that can be later used as a handle to perform monitor and control operations on the job (e.g. see edg-job-status described later in this document). The format of the edg_jobId is as follows:

https://Lbserver_address[:port]/unique_string The unique_string is a md5 string computed taking into account the following information:

- IP of the User Interface machine, - timestamp, - process ID (more UI instances may occur on the same machine), - sequence or just random number (if the User Interface submits jobs in batches and

more than one per second can be submitted), The final md5 sum is encoded using modified Base64 encoding (“:” is used instead of “/”) ensuring reasonable uniqueness and compactness of job IDs.

The structure of the edg_jobId that could appear in some way complex and not easily readable, has been conceived in order to ensure uniqueness and at the same time contain information that are needed by the components of the WMS to fulfil user requests. The --vo option allows the user to specify the Virtual Organisation she/he is currently working for in case she/he is working with non-VOMS credentials. Indeed, if the user proxy credentials currently available on the UI contains VOMS extensions specifying one or more VOs, then the default VO from the proxy credentials has precedence over all other possible choiches and is taken as the current working VO. If the --vo option is not used (and the proxy credentials does not contain extensions), then the VirtualOrganisation attribute in the JDL is considered. If this attribute has not been specified in the JDL, then the default VO specified in the $EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field) configuration file is considered. Otherwise an error is returned to the user. The --resource option can be used to target the job submission to a specific known resource identified by the provided Computing Element identifier ce_id (returned by edg-job-list-match described later in this document). The CE identifier is a string published in the IS (the GlueCEUniqueID field in the Glue schema) that univocally identifies a resource belonging to the Grid. The admitted format for CEId is:

<full-hostname>:<port-number>/jobmanager-<service>-<queue-name>

where <service> is for example lsf, pbs, bqs, condor but can also be a different string as it is freely set by the site administrator when the queue is set-up. When the --resource option is specified, the WMS skips completely the match making process and directly submits the job to the requested CE. It is important to note that in this IST-2000-25182 PUBLIC 88 / 146

Page 89: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

case the “.BrokerInfo” file is not generated even if data requirements have been specified in the JDL, so jobs submitted using this option should not rely on the .BrokerInfo file information when running on the CE. The “.BrokerInfo” file is a file generated by the RB/Matchmaker during matchmaking and contains information about the location where input data specified in the JDL are physically stored, the SEs that are “close” to the CE chosen for submitting the job etc. It is shipped within the InputSandbox to the CE where the job is going to run so that it can be used at run-time to get information (through the appropriate API) for accessing data. Details about the “.BrokerInfo” file and the BrokerInfo API can be found in [R1]. A way for performing direct submission to a given CE and at the same time having the “.BrokerInfo” file generated by RB and shipped to the CE is to not use the --resource option and specify the following requirements in the JDL: Requirements = other.GlueCEUniqueID == <Ce_identifier>;

(e.g. Requirements = other.GlueCEUniqueID == “lxde01.pd.infn.it:2119/jobmanager-lsf-grid01”;) It is also possible to specify the target CE to which submit the job using the --input option. With the --input option an input_file must be supplied containing a list of target CE ids. In this case the edg-job-submit command parses the input_file and displays on the standard output the list of CE Ids written in the input_file. The user is then asked to choose one CEId between the listed ones. The command will then behave exactly like already explained for the --resource option. The basic idea of this command is to use as input_file the output file generated by the edg-job-list-match command when used with the --output option (see edg-job-list-match) that contains the list of CE Ids (if any) matching the requirements specified in the jobad.jdl file. An example of a possible sequence of commands is: >$ edg-job-list-match --output CEList.out jobad.jdl

>$ edg-job-submit --input CEList.out jobad.jdl

If CEList.out contains more than one CEId then the user is prompted for choosing one Id from the list. It is possible to redirect the returned edg_jobId to an output file using the --output option. If the file already exists, a check is performed: if the file was previously created by the command edg-job-submit (i.e. it contains a well defined header), the returned edg_jobId is appended to the existing file every time the command is launched. If the file wasn’t created by the command edg-job-submit the user will be prompted to choose if overwrite the file or not. If the answer is no the command will abort. The edg-job-submit command has a particular behaviour when the job description file contains the InputSandbox attribute whose value is a list of file paths on the UI machine local disk. The purpose of the introduction of the InputSandbox attribute is to stage, from the UI to the CE, files that are needed for the execution. To better understand, let’s suppose to have a job that needs for the execution a certain set of files having a small size and available on the submitting machine. Let’s also suppose that for performance reasons it is preferable not going through the WP2 data transfer services for the staging of these files on the executing node. Then the user can use the InputSandbox attribute to specify the files that have to be staged from the submitting machine to the executing CE. All of them are indeed transferred at job submission time together with the job

IST-2000-25182 PUBLIC 89 / 146

Page 90: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

class-ad to the NS that will store them temporarily on its local disk. The JobWrapper will then perform the staging of these files on the executing node. The size of files to be transferred to the “RB node” should be small since overfull of RB node local storage means that no more job of this type can be submitted (see section 4.2.4.3). This mechanism can also be used to stage a job executable available locally on the UI machine to the executing CE. Indeed in this case the user has to include this file in the InputSandbox list (specifying its absolute path in the file system of the UI machine) and as Executable attribute value has only to specify the file name. On the contrary, if the executable is already available in the file system of the executing machine, the user has to specify as Executable an absolute path name for this file (if necessary using environment variables). The same argument can be applied to the standard input file that is specified through the StdInput JDL attribute. Since the InputSandbox expression can consist of a great number of file names, it is admitted the use of wildcards and environment variables to specify the value of this attribute. Syntax and allowed wildcards are described in Annex 7.6. It is important to note that since the gridftp protocol (the protocol used for the InputSanbox files staging) in general doesn't preserve the x flag, the script specified as Executable in the JDL (on which chmod +x is done automatically by the WP1 JobWrapper), should perform a chmod +x for all the files needing execution permission, that are transferred within the InputSandbox of the job. For the standard output and error of the job the user shall instead always specify just file names (without any directory path) through the StdOutput and StdError JDL attributes. To have them staged back on the UI machine it suffices to list them in the OutputSandbox and use after job completion the edg-job-get-output command described later in this document. The list of data specification JDL attributes is completed by the InputData and OutputData attributes. InputData refers to data used as input by the job that are not subjected to staging and are stored in one or more storage elements and published in replica catalogues. When the user specifies the InputData attribute then he/she also has to provide the protocol her/his application is able to “speak” for accessing data (DataAccessProtocol attribute). The InputData attribute should contain a list of Logical File Names (LFN) and/or Grid Unique Identifilers (GUID). There is no need to specify the Replica Location Service to be contacted for resolving the logical files names and GUIDs to storage files names as it is automatically determined by the WP2 software through the VO the user belongs to. This information is provided by in the VirtualOrganisation JDL attribute (filled by the UI). It is worth noting that the usage for the ranking phase of the WP2 getAccessCost (see 7.7.3) i.e. ranking CEs according to the cost for accessing data, can be triggered through the JDL by setting the rank as follows:

Rank = other.DataAccessCost;

The OutputData attribute allows instead the user to ask for the automatic upload and registration of datasets produced by the job on the WN. Through this attribute it is possible to indicate for each output file the LFN to be used for registration and the SE on which the file has to be uploaded.

IST-2000-25182 PUBLIC 90 / 146

Page 91: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Both LFN and SE are optional in the sense that if no LFN is indicated then it is assigned automatically by the WP2 services (RM) and if no SE is indicated, the close SE is considered. OutputData is a list of classads, where each classad indicates the name of the file to be uploaded, the logical file to be used and the SE where the file has to be copied. E.g.:

OutputData = {

[

OutputFile = "dataset_1.out ";

StorageElement = "se1.cnaf.infn.it";

LogicalFileName = "lfn:LFN_1"

],

[

OutputFile = "dataset_2.out ";

StorageElement = "se2.pd.infn.it";

LogicalFileName = "lfn:LFN_2"

],

]

OutputFile = "dataset_3.out ";

StorageElement = "se3.cesnet.cz";

LogicalFileName = "lfn:LFN_3"

]

};

If the attribute OutputData is found in the JDL then the JobWrapper at the end of the job calls the WP2 “copy And Register” service that copies the file from the WN onto the specified SE and registers it with the given LFN. As usual, logical file names have to be prefixed with the string “lfn:”. If the specified LFN is already in use, WP2 RM registers the file with a newly generated identifier GUID (Grid Unique Identifier). During this process the JobWrapper creates a file (named “DSUpload_<unique_jobid_string>.out”) that is put automatically in the OutputSandbox attribute list by the UI and can then be retrieved by the user. This file contains the results of the upload and registration process in the following format:

<FILE_NAME> <LFN | ERROR>

e.g. in our case we could have:

dataset_1.out LFN_1

dataset_2.out <GUID2>

dataset_3.out <error code returned by RM>

meaning that dataset_1.out was uploaded successfully and registered as LFN_1, dataset_1.out was uploaded successfully but with name <GUID2> (assigned by the ERM)

IST-2000-25182 PUBLIC 91 / 146

Page 92: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

since LFN_2 was already in use and upload of dataset_3.out failed for the reason specified by the reported error message. It is worth noting that the StorageElement attributes of the OutputData list are not taken into account by the RB for the matchmaking, so the job could have run on a CE that is not close to the specified SEs. Due to this it is suggested (unless the user has particular needs) to omit the StorageElement specification so that the close SEs are automatically taken into account for the datasets upload. The Arguments attribute in the JDL allows the user to specify all the command line arguments needed to start the job. They have to be specified as a single string, e.g. the job sum that is started with: $ sum N1 N2 –out result.out

is described by: Executable = “sum”;

Arguments = “N1 N2 –out result.out”;

If you want to specify a quoted string inside the Arguments then you have to escape quotes with the \ character. E.g. when describing a job like: $ grep –i “my name” *.txt

you will have to specify: Executable = “/bin/grep”;

Arguments = “-i \”my name\” *.txt”;

Analogously, if the job takes as argument a string containing a special character (e.g. the job is the tail command issued on a file whose name contains the quotes character, say file1&file2), since on the shell line you would have to write: $ tail –f file1\&file2

in the JDL you’ll have to write: Executable = “/usr/bin/tail”;

Arguments = “-f file1\\\&file2”;

i.e. a \ for each special character.

IST-2000-25182 PUBLIC 92 / 146

Page 93: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

In general, special characters such as &, |, >, < are only allowed if specified inside a quoted string or preceded by triple \. The character “`” cannot be specified in the Arguments attribute of the JDL. The RetryCount attribute allows setting the number of submission retries for a job upon failure due to some grid component (i.e. not to the job itself). RetryCount has to be a positive number and the actual number of submission retries for a job is represented by the minimum value between RetryCount itself and the value of the MaxrRetryCount parameter in the WM configuration file (see section 4.2.2.3). It suffices setting RetryCount to 0 to disable job resubmission. It is important to recall here that the safest way for submitting long-running jobs is to use the proxy renewal feature provided by the WMS. To do this the user should use the myproxy-init command (see section 6.1.1.1) before the edg-job-submit. The myproxy-init command registers indeed in a MyProxy server a valid long-term certificate proxy that will be used by WMS to perform a periodic credential renewal for the submitted job. When using the myproxy-init command the user has to specify either through the –s option or the MYPROXY_SERVER environment variable the host name of the MyProxy server where to store the certificate proxy. To trigger the proxy renewal mechanism, the same MyProxy server address has to be specified in the JDL through the MyProxyServer attribute (this can also be made a default behaviour through the configuration – see 4.5.3.1). An example of the JDL setting is provided hereafter:

MyProxyServer = “skurut.cesnet.cz”;

Note that the port number must not be provided. Interactive jobs are specified setting the JDL JobType attribute to “Interactive”. When an interactive job is submitted, the edg-job-submit command starts a grid console shadow process in the background that listens on a port for the job standard streams. Moreover the edg-job-submit command opens a new window where the incoming job streams are forwarded. The port on which the shadow process listens is assigned by the OS, but can be forced through the ListenerPort attribute in the JDL. As the command in this case opens a X window, the user should make sure the DISPLAY environment variable is correctly set, a X server is running on the local machine and if she/he is connected to the UI node from remote machine (e.g. with ssh) enable secure X11 tunneling. If this is not possible, the user can specify the --nogui option that makes the command provide a simple standard non-graphical interaction with the running job. Another option that is reserved for interactive jobs is --nolisten: it makes the command forward the job standard streams coming from the WN to named pipes on the UI machine whose names are returned to the user together with the OS id of the listener process. This allows the user to interact with the job through her/his own tools. It is important to note that when this option is specified, the UI has no more control over the launched listener process that has hence to be killed by the user (through the returned process id) when the job is finished.

IST-2000-25182 PUBLIC 93 / 146

Page 94: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

For interactive jobs the UI automatically requires for the job outbound IP connectivity on the WN adding (in AND to the user defined expression) the other.GlueHostNetworkAdapterOutboundIP to the JDL Requirements expression. Checkpointable jobs are specified setting the JDL JobType attribute to “Checkpointable”. When a checkpointable job is submitted the user can specify the number (or list) of steps in which the job can be logically decomposed and the step to be considered as the initial one. This can be done setting respectively the JDL attributes JobSteps and CurrentStep. CurrentStep is a mandatory attribute and if not provided by the user, it is set automatically to 0 by the UI. The --chkpt option allows the submission of a checkpointable job specifying as input a checkpoint state generated by a previously submitted job. This option makes the submitted job start running from the checkpoint state given in input and not from the very beginning. The initial checkpoint states to be used with this option can be retrieved by means of the edg-job-get-chkpt command (see 6.1.3.8). A checkpoint state is a JDL file as described in [R3]. MPI jobs are specified setting the JDL JobType attribute to “MPICH”. When a MPI job is submitted the presence of the NodeNumber attribute (it specifies the required number of CPUs) in the JDL is mandatory and the UI automatically requires the MPICH runtime environment installed on the CE and a number of CPUs at least equal to the required number of nodes. This is done adding (in AND to the user defined expression) the following expression (other.GlueCEInfoTotalCPUs >= NodeNumber) && Member(other.GlueHostApplicationSoftwareRunTimeEnvironment,"MPICH")

to the the JDL Requirements expression. Lastly the --nomsg option makes the command display neither messages nor errors on the standard output. Only the edg_jobId assigned to the job is printed to the user if the command was successful. Otherwise the location of the generated log file containing error messages is printed on the standard output. This option has been provided to make easier use of the edg-job-submit command inside scripts in alternative to the --output option. It is important to note that the edg-job-submit is a sort of fire-and-forget command, i.e. it exits successfully once the JDL has been passed to the NS and the InputSandbox files have been transferred. It does not matter about what happens afterwards to the job. Understanding the reason of a job abort can however be accomplished by using the edg-job-status (especially looking at the “Status Reason” field) and edg-job-get-logging-info on the job identifier returned from the submission. JOB DESCRIPTION FILE A job description file contains a description of job characteristics and constraints in a class-ad style. A general description of the class-ad language is provided in document [A5]. The job description file must be edited by the user to insert relevant information about the job that is later needed by the RB to perform the match-making. Job description file entries are strings having the format attribute = expression and are terminated by the semicolon character. Attribute expressions can span several lines provided the semicolon is put only at the end of the whole expression. Comments must be preceded by a sharp character (#) or have to follow the C++ syntax, i.e a double slash (//) at the beginning of each line or statements begun/ended respectively with “/*” and “*/”.

IST-2000-25182 PUBLIC 94 / 146

Page 95: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Being the class-ad an extensible language, it doesn’t exist a fixed set of admitted attributes, i.e. the user can insert in the job description file whatever attribute he believes meaningful to describe her/his jobs, anyway only the attributes that can be in some way connected with the resource ones published in the IS are taken into account by the Matchmaker/RB for the match-making process. Unrelated attributes are simply ignored except when they are used to build the Requirements expression. In the latter case they are indeed evaluated and could affect the match-making result. The attributes taken into account by the RB together with their meaning are listed in annex 7.1 and described in detail in document [A1]. That is the document that has to be followed by the user when writing the JDL description of her/his jobs. There is a small subset of JDL attributes that are compulsory, i.e. that have to be present in a job class-ad before it is sent to the Network Server in order to make possible the performing of the match making and submission. They can be grouped in two categories: some of them must be provided by the user whilst some other, if not provided, are filled by the UI with configurable default values. The following Table 1 summarises what just stated.

Attribute Mandatory Mandatory with default value (default value)

Type b “Job”

JobType b “Normal”

Executable b Requirements b

other.GlueCEStateStatus == "Production"

[configurable]

Rank b other.GlueCEStateFreeCPUs

(for MPICH jobs) other.GlueCEStateEstimatedResponseTime

(for all other job types)

[configurable]

NodeNumber b (only for MPICH jobs)

CurrentStep b 0

(only for checkpointable jobs) VirtualOrganisation b

[configurable] DataAccessProtocol b

(only if InputData has

IST-2000-25182 PUBLIC 95 / 146

Page 96: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Attribute Mandatory Mandatory with default value

(default value) been specified)

InputData b (only if

DataAccessProtocol has been specified)

Table 1 Mandatory Attributes

In Table 1 the default values for Requirements and Rank can be interpreted respectively as follows: - if the user has not provided job constraints then Requirements is set to

(other.GlueCEStateStatus == "Production"), i.e. the target CE has to be active. - Since in the JDL the greater is the value of Rank the better is considered the match, if no

expression for Rank has been provided, then the resources where the jobs waits a shorter time to pass from the SCHEDULED to the RUNNING status are preferred, hence the Rank expression is set to (- other.GlueCEStateEstimatedResponseTime). MPICH jobs are an exception as they have as default rank other.GlueCEStateFreeCPUs meaning that the preferred resources are the ones having the higher number of free CPUs.

The default values for the Requirements and Rank attributes can be set in the $EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf file. See section 4.5.3.1 for details on how to set these defaults. As the classad language (and hence the JDL) is an extensible language, it allows the user to freely include new attributes within the job description. These attributes are ignored by the WMS for the scheduling but are passed-through by the UI (if their syntax is correct) since they could be relevant for the submitter of for some other component processing the JDL. However if the job description file contains attributes that are unknown to the WMS, the UI will print a warning (when used with the --debug option) listing all of them. OPTIONS --help displays command usage. --version displays UI version. --vo vo_name

This option allows the user to specify the Virtual Organisation she/he is currently working for. If the user proxy contains VOMS extensions then the VO specified through this option is overridden by the default VO contained in the proxy (i.e. this option is only useful

IST-2000-25182 PUBLIC 96 / 146

Page 97: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

when working with non-VOMS proxies). The following precedence rule is followed for determining the user's VO: − the default VO from the user proxy (if it contains VOMS extensions), − the VO specified through the --vo or --config-vo options, − the VO specified in the configuration file pointed by the

EDG_WL_UI_CONFIG_VO environment variable, − the VirtualOrganisation attribute in the JDL (if the user proxy contains VOMS

extensions this value is overridden as above), − the default VO specified in the

$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field) configuration file.

If none of the listed trials has success an error is returned and the submission is aborted.

--resource ce_id -r ce_id

if the command is launched with this option, the job-ad sent to the NS contains a line of the type SubmitTo = ce_id and the job is submitted by the WMS to the resource identified by ce_id without going through the match-making process. Accepted format for the CEId is: <full hostname>:<port number>/jobmanager-<service>-<queue name> where <service> could be for example lsf, pbs, bqs, condor but can also be a different string as it is freely set by the site administrator when setting the queue. Note that when this option is used, the “.BrokerInfo” file is not generated.

--input file_path -i input_file

if this option is specified, the user will be asked to choose a CEId from a list of CEs contained in the file_path. Once a CEId has been selected the command behaves as explained for the --resource option. If this option is used together with the –noint one and the input file contains more than one CEId, then the first CEId in the list is taken into account for submitting the job.

--config file_path -c file_path

if the command is launched with this option, the configuration file pointed to by file_path is used instead of the standard configuration file.

--config-vo file_path if the command is launched with this option, the vo-specific configuration file pointed to by file_path is used instead of the standard vo-specific configuration file.

IST-2000-25182 PUBLIC 97 / 146

Page 98: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

--output file_path -o file_path

writes the generated edg_jobId assigned to the submitted job in the file specified by out_file. out_file can be either a simple name or an absolute path (on the submitting machine). In the former case the file out_file is created in the current working directory.

--chkpt file_path

This option can be used only for checkpointable jobs. The state specified as input is a checkpoint state generated by a previously submitted job. This option makes the submitted job start running from the checkpoint state given in input and not from the very beginning. The initial checkpoint states to be used with this option can be retrieved by means of the edg-job-get-chkpt command (see 6.1.3.8).

--nogui This option can be used only for interactive jobs. As the command for such jobs opens a X window, the user should make sure a X server is running on the local machine and if she/he is connected to the UI node from remote machine (e.g. with ssh) enable secure X11 tunneling. If this is not possible, the user can specify the --nogui option that makes the command provide a simple standard non-graphical interaction with the running job.

--nolisten

This option can be used only for interactive jobs. It makes the command forward the job standard streams coming from the WN to named pipes on the UI machine whose names are returned to the user together with the OS id of the listener process. This allows the user to interact with the job through her/his own tools. It is important to note that when this option is specified, the UI has no more control over the launched listener process that has hence to be killed by the user (through the returned process id) once the job is finished.

--nomsg

this option makes the command print on the standard output only the edg_jobId generated for the job if submission was successful; the location of the log file containing massages and diagnostics is printed otherwise.

--noint if this option is specified every interactive question to the user is skipped and all warning messages and errors (if occurred) are written to the file edg-job-submit_<UID>_<PID_<timestamp>.log under the /tmp directory. Log file location is configurable.

IST-2000-25182 PUBLIC 98 / 146

Page 99: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

--debug when this option is specified, information about parameters used for the API functions calls inside the command are displayed on the standard output and are written to edg-job-submit_<UID>_<PID>_<timestamp>.log file under the /tmp directory too. Log file location is configurable.

--logfile file_path when this option is specified, the command log file is relocated to the location pointed by file_path

jdl_file this is the file containing the JDL describing the job to be submitted. It must be the last argument of the command.

EXIT STATUS edg-job-submit exits with a status value of 0 (zero) upon success, and >0 (greater than zero) upon failure. EXAMPLES 1. $> edg-job-submit –vo cms myjob1.jdl

where myjob1.jdl is as follows: ##############################################

#

# -------- Job description file ---------- #

##############################################

[

JobType = "Normal" ;

Executable = "$(CMS)/fpacini/exe/sum.exe";

InputData = "lfn:testbed0-00019";

DataAccessProtocol = "gridftp";

Rank = other.GlueCEPolicyMaxCPUTime;

Requirements = other.GlueCEInfoLRMSType == "Condor" && \ (!(RegExp("*nikhef*",other.GlueCEUniqueID)));

IST-2000-25182 PUBLIC 99 / 146

Page 100: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

submits sum.exe to a resource (supposed to contain the executable file) whose LRMS is Condor and not containing the string “nikhef” in the CE identifier. The command returns the following output to the user, containing the job handle (edg_jobid):

================= edg-job-submit Success ================================== The job has been successfully submitted to the Network Server. Your job is identified by (edg_jobId): https://ibm139.cnaf.infn.it:9000/ZU9yOC7AP7AOEhMAHirG3 Use edg-job-status command to display current job status. ====================================================================== 2. $> edg-job-submit --chkpt /home/test/state10.chkpt myjob2.jdl Submits the checkpointable job described by myjob2.jdl that will start running from the initial state state10.chkpt. SEE ALSO [A1], [A2], edg-job-list-match, edg-job-attach, edg-job-get-chkpt.

IST-2000-25182 PUBLIC 100 / 146

Page 101: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

6.1.3.2. edg-job-get-output This command retrieves the job output files (specified by the OutputSandbox attribute of the job-ad) from the RB node and stores them on the submitting machine local disk. SYNOPSIS edg-job-get-output [options] <job Id(s)>

Options:

--help

--version

--input, -i <file_path>

--dir <directory_path>

--config, -c <file_path>

--noint

--debug

--logfile <file_path> DESCRIPTION The edg-job-get-output command can be used to retrieve the output files of a job that has been submitted through the edg-job-submit command with a job description file including the OutputSandbox attribute. After the submission, when the job has terminated its execution, the user can download the files generated by the job and temporarily stored on the RB machine as specified by the OutputSandbox attribute, issuing the edg-job-get-output with as input the edg_jobId returned by the edg-job-submit. It is also possible to specify a list of job identifiers when calling this command or an input file containing edg_jobIds by means of the --input option. When the --input is used, the user is requested to choose all, one or a subset of the job identifiers contained in the input file. It is important to note that the OutputSandbox of a submitted job can only be retrieved when the job has reached the Done status (see Annex 7.2) indicating that the job has successfully terminated its execution and the OutputSandbox files are ready for retrieval on the RB node. edg-job-get-output will always fail for jobs that are not yet in the Done status. The user can decide the local directory path on the UI machine where these files have to be stored by means of the --dir option, otherwise the retrieved files are put in a default location specified in the $EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf configuration file (OutputStorage parameter). In both cases a sub-directory will be added to the path supplied. The name of this sub-directory is the unique string of the edg_jobId identifier (see command edg-job-submit for details on the edg_jobId structure) prefixed by the user login name (value of the LOGNAME environment variable).

IST-2000-25182 PUBLIC 101 / 146

Page 102: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

If the user wants to use her/his “private” configuration file, this can be done using option --config path_name. As a consequence the edg-job-get-output command looks for the file “path_name” instead of the standard configuration file. If this file does not exist the user is notified with an error message and the command is aborted. OPTIONS --help displays command usage. --version displays UI version. --dir directory_path

retrieved files (previously listed by the user through the OutputSandbox attribute of the job description file) are stored in the location indicated by directory_path/<login name>_<edg_jobId unique string>.

--config file_path -c file_path

if the command is launched with this option, the configuration file pointed to by file_path is used instead of the standard configuration file.

--noint

if this option is specified every interactive question to the user is skipped. All warning messages and errors (if occurred) are written to the file edg-job-get-output_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location of log file is configurable.

--debug

when this option is specified, information about parameters used for the API functions calls inside the command are displayed on the standard output and are written to dg-get_job_output_<UID>_<PID>_<timestamp>.log file under the /tmp directory too. Location of log file is configurable.

--logfile file_path when this option is specified, the command log file is relocated to the location pointed by file_path

edg_jobId

IST-2000-25182 PUBLIC 102 / 146

Page 103: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

job identifier returned by edg-job-submit. If a list of oe or more job identifiers is specified, edg_jobIds have to be separated by a blank. Job identifiers must be last argument of the command.

--input file_path -i file_path

this option makes the command return the OutputSandbox files for each edg_jobId contained in the file_path. This option can't be used if one (or more) edg_jobIds have been already specified. The format of the input file must be as follows: one edg_jobId for each line and comment lines must begin with a "#" or a "*" character. See 6.1.2.1 for details about this option.

EXIT STATUS edg-job-get-output exits with a status value of 0 (zero) upon success, >0 upon failure and <0 upon partial failure. An example of partial failure is when more than one job identifiers has been specified and the OuputSandbox could be retrieved only for some of them.

IST-2000-25182 PUBLIC 103 / 146

Page 104: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

EXAMPLES Let us consider the following command issued by the user logges as mrossi: $> edg-job-get-output https://ibm139.cnaf.infn.it:9000/CiXMLojKC_iLsvSHfEhqIQ --dir /home/data It retrieves the files listed in the OutputSandbox attribute of job identified by https://ibm139.cnaf.infn.it:9000/CiXMLojKC_iLsvSHfEhqIQ from the RB node and stores them locally in /home/data/mrossi_CiXMLojKC_iLsvSHfEhqIQ.

IST-2000-25182 PUBLIC 104 / 146

Page 105: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

6.1.3.3. edg-job-list-match Returns the list of resources fulfilling job requirements specified in the JDL job description SYNOPSIS edg-job-list-match [options] <jdl file>

Options:

--help

--version

--verbose

--rank

--config, -c <file_path>

--config-vo <file_path>

--vo <vo_name>

--output, -o <file_path>

--noint

--debug

--logfile <file_path> edg-job-list-match displays the list of identifiers of the resources on which the user is authorized and satisfying the job requirements included in the job description file. The CE identifiers are returned either on the standard output or in a file according to the chosen command options, and are strings univocally identifying the CEs published in the IS. The returned CEIds are listed in decreasing order of rank, i.e. the one with the best (greater) rank is in the first place and so on. The --rank option makes the command also display the rank value for each found CEId. The --vo option allows the user to specify the Virtual Organisation she/he is currently working for in case she/he is working with non-VOMS credentials. Indeed, if the user proxy credentials currently available on the UI contains VOMS extensions specifying one or more VOs, then the default VO from the proxy credentials has precedence over all other possible choiches and is taken as the current working VO. If the --vo option is not used (and the proxy credentials does not contain extensions), then the VirtualOrganisation attribute in the JDL is considered. If this attribute has not been specified in the JDL, then the default VO specified in the $EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field) configuration file is considered. Otherwise an error is returned to the user. edg-job-list-match requires a job description file in which job characteristics and requirements are expressed by means of a class-ad. The job description file is first syntactically checked and then used as the main command-line argument to edg-job-list-match. The Network Server is only contacted to find job compatible resources; the job is not

IST-2000-25182 PUBLIC 105 / 146

Page 106: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

submitted. See the edg-job-submit section 6.1.3.1 and in particular Error! Reference source not found. for general rules for building the job description file. If the user wants to use his “private” configuration, file this can be done using option --config path_name. The option --verbose of the dg-job-list-match command can be used to obtain on the standard output the class-ad sent to the RB generated from the job description. The --output option makes the command save the list of compatible resources into the specified file. If the provided file name is not an absolute path, then the output file is created in the current working dir. JOB DESCRIPTION FILE See section 6.1.3.1for details.

IST-2000-25182 PUBLIC 106 / 146

Page 107: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

OPTIONS OPTIONS --help displays command usage. --version displays UI version.

--verbose -v

displays on the standard output the job class-ad that is sent to the Network Server generated from the job description file. This differs from the content of the job description file since the UI adds to it some attributes that cannot be directly inserted by the user (e.g., defaults for Rank and Requirements if not provided, VirtualOrganisation etc).

--rank

displays the “matching” CEIds and the associated ranking values. --vo vo_name

This option allows the user to specify the Virtual Organisation she/he is currently working for. If the user proxy contains VOMS extensions then the VO specified through this option is overridden by the default VO contained in the proxy (i.e. this option is only useful when working with non-VOMS proxies). The following precedence rule is followed for determining the user's VO: − the default VO from the user proxy (if it contains VOMS extensions), − the VO specified through the --vo or --config-vo options, − the VO specified in the configuration file pointed by the

EDG_WL_UI_CONFIG_VO environment variable, − the VirtualOrganisation attribute in the JDL (if the user proxy contains VOMS

extensions this value is overridden as above), − the default VO specified in the

$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field) configuration file.

If none of the listed trials has success an error is returned and the submission is aborted.

IST-2000-25182 PUBLIC 107 / 146

Page 108: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

--config file_path -c file_path

if the command is launched with this option, the configuration file pointed to by file_path is used instead of the standard configuration file.

--config-vo file_path if the command is launched with this option, the vo-specific configuration file pointed to by file_path is used instead of the standard vo-specific configuration file.

--output file_path -o file_path

returns the CEIds list in the file specified by file_path. file_path can be either a simple name or an absolute path (on the submitting machine). In the former case the file file_path is created in the current working directory.

--noint if this option is specified every interactive question to the user is skipped. All warning messages and errors (if any) are written to the file edg-job-list-match <UID>_<PID>_<timestamp>.log under the /tmp directory. Location of the log file is configurable.

--debug when this option is specified, information about the API functions called inside the command are displayed on the standard output and are written to the file edg-job-list-match_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location of the log file is configurable.

--logfile file_path

when this option is specified, the command log file is relocated to the location pointed by file_path

jdl_file

this is the file containing the classad describing the job to be submitted. It must be the last argument of the command.

EXIT STATUS edg-job-list-match exits with a status value of 0 upon success, and a >0 value upon failure.

IST-2000-25182 PUBLIC 108 / 146

Page 109: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

EXAMPLES Let us consider the following command: $> edg-job-list-match myjob.jdl where the job description file myjob.jdl looks like: #########################################

#

# ---- Sample Job Description File ----

#

#########################################

JobType = "Normal";

Executable = "sum.exe";

StdInput = "data.in";

InputSandbox = {"/home_firefox/fpacini/exe/sum.exe","/home1/data.in"};

OutputSandbox = {"data.out","sum.err"};

Rank = other.GlueCEPolicyMaxCPUTime;

Requirements = other.GlueCEInfoLRMSType == "Condor" &&

other.GlueHostArchitecturePlatformType== "INTEL" &&

other.GlueHostOperatingSystemName == "LINUX" && other.GlueCEStateFreeCPUs >= 2;

In this case the job requires CEs being Condor Pools of INTEL LINUX machines with at least 2 free Cpus. Moreover the Rank expression states that queues with higher maximum CPU time allowed for jobs are preferred. The response of such a command is something as follows: ***************************************************************************

Computing Element IDs LIST

The following CE(s) matching your job requirements have been found:

*CEId*

bbq.mi.infn.it:2119/jobmanager-pbs-dque

skurut.cesnet.cz:2119/jobmanager-pbs-wp1

*************************************************************************** $> SEE ALSO [A1],[A2], edg-job-submit.

IST-2000-25182 PUBLIC 109 / 146

Page 110: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

6.1.3.4. edg-job-cancel Cancels one or more submitted jobs. SYNOPSIS edg-job-cancel [options] <job Id(s)>

Options:

--help

--version

--all

--input, -i <file_path>

--config, -c <file_path>

--config-vo <file_path>

--vo <vo_name>

--output, -o <file_path>

--noint

--debug

--logfile <file_path> DESCRIPTION This command cancels a job previously submitted using edg-job-submit. Before cancellation, it prompts the user for confirmation. The cancel request is sent to the Network Server that forwards it to the WM that fulfils it. edg-job-cancel can remove one or more jobs: the jobs to be removed are identified by their job identifiers (edg_jobIds returned by edg-job-submit) provided as arguments to the command and separated by a blank space. The result of the cancel operation is reported to the user for each specified edg_jobId. If the --all option is specified, all the jobs owned by the user submitting the command are removed. When the command is launched with the --all option, no edg_jobId can be specified. It has to be remarked that only the owner of the job can remove the job. When the --all option is specified the UI queries each LB listed in the vo-specific configuration file $EDG_WL_LOCATION/etc/<vo_name>/edg_wl_ui.conf for getting the identifiers of all the jobs owned by the user identified by her/his certificate subject. Afterwards the UI sends a cancellation request to the NS for each job being in a status for which the cancellation is allowed. Job states for which cancellation is allowed are:

- Submitted - Waiting

IST-2000-25182 PUBLIC 110 / 146

Page 111: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

- Ready - Scheduled - Running - Unknown

For all the other job states the cancellation request will result in a failure. If the user wants to use his “private” configuration file this could be done using option --config file_path. The --input option permits to specify a file (file_path) that contains the edg_jobIds to be removed. The format of the file must be as follows: one edg_jobId for each line and comment lines must begin with a “#” or a “*” character. When using this option the user is interrogated for choosing among all, one or a subset of the listed job identifiers. If the file_path does not represent an absolute path the file will be searched in the current working directory.

IST-2000-25182 PUBLIC 111 / 146

Page 112: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

OPTIONS --help displays command usage. --version displays UI version.

--all

cancels all job owned by the user submitting the command. This option can’t be used either if one or more edg_jobIds have been specified explicitly or with the –input option.

--input file_path -i file_path

cancels edg_jobId contained in the file_path. This option can’t be used neither if one or more edg_jobIds have been specified nor with the –all option. See 6.1.2.1 for details about this option.

--config file_path -c file_path

if the command is launched with this option, the configuration file pointed to by file_path is used instead of the standard configuration file.

--config-vo file_path if the command is launched with this option, the vo-specific configuration file pointed to by file_path is used instead of the standard vo-specific configuration file. This option is allowed only when used together with the --all one.

--vo vo_name This option allows the user to specify the Virtual Organisation she/he is currently working for. If the user proxy contains VOMS extensions then the VO specified through this option is overridden by the default VO contained in the proxy (i.e. this option is only useful when working with non-VOMS proxies). The following precedence rule is followed for determining the user's VO: − the default VO from the user proxy (if it contains VOMS extensions), − the VO specified through the --vo or --config-vo options, − the VO specified in the configuration file pointed by the

EDG_WL_UI_CONFIG_VO environment variable,

IST-2000-25182 PUBLIC 112 / 146

Page 113: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

− the VirtualOrganisation attribute in the JDL (if the user proxy contains VOMS

extensions this value is overridden as above), − the default VO specified in the

$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field) configuration file.

If none of the listed trials has success an error is returned and the submission is aborted. This option is not allowed when one or more edg_jobIds are specified as command arguments.

--output file_path -o file_path

writes the cancel results in the file specified by file_path instead of the standard output. file_path can be either a simple name or an absolute path (on the submitting machine). In the former case the file file_path is created in the current working directory.

--noint if this option is specified every interactive question to the user is skipped. All warning messages and errors (if occurred) are written to the file edg-job-cancel_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location of the log file is configurable.

--debug

when this option is specified, information about the API functions called inside the command are displayed on the standard output and are written to the file edg-job-cancel_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location of the log file is configurable.

--logfile file_path

when this option is specified, the command log file is relocated to the location pointed by file_path

edg_jobId

job identifier returned by edg-job-submit. The job identifier list must be the last argument of this command.

EXIT STATUS dg-job-cancel exits with a status value 0 if all the specified jobs were cancelled successfully, >0 if errors occurred for each specified job id and <0 in case of partial failure. An example of partial failure is when more then one job has been specified: some jobs could be successfully removed and some others could be not removed.

IST-2000-25182 PUBLIC 113 / 146

Page 114: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

EXAMPLES 1. $> edg-job-cancel --input joblist.txt where joblist.txt is a file containing 3 edg_JobIds, displays the following confirmation message: Are you sure you want to remove all jobs specified? [y/n]n: y

====================== edg-job-cancel Success =========================

The cancel request for the following job(s) has been successfully submitted to NS:

- https://ibm139.cnaf.infn.it:9000/nUbiIiMFmY1oIusAaWxPhg

- https://ibm139.cnaf.infn.it:9000/VtMvhs8z7WGCptt92ZMPIQ

- https://ibm139.cnaf.infn.it:9000/yKTKyrdSgHKQ1wwwSocJiw

======================================================================== $> In this case the command exit code is 0. 2. $> edg-job-cancel --all --noint removes all job owned by the user submitting the command. SEE ALSO [A2], edg-job-submit.

IST-2000-25182 PUBLIC 114 / 146

Page 115: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

6.1.3.5. edg-job-status Displays bookkeeping information about submitted jobs. SYNOPSIS edg-job-status [options] <job Id(s)>

Options:

--help

--version

--all

--input, -i <file_path>

--verbosity, -v <verbosity_value>

--config, -c <file_path>

--config-vo <file_path>

--vo <vo_name>

--output, -o <file_path>

--noint

--debug

--logfile <file_path> DESCRIPTION This command prints the status of a job previously submitted using edg-job-submit. The job status request is sent to the LB that provides the requested information. This can be done during the whole job life. edg-job-status can monitor one or more jobs: the jobs to be checked are identified by one or more job identifiers (edg_jobIds returned by edg-job-submit) provided as arguments to the command and separated by a blank space. If the --all option is specified, information about all the jobs owned by the user submitting the command is printed on the standard output. When the command is launched with the --all option, neither can an edg_jobId be specified nor can the --input option be specified. The --input option permits to specify a file (file_path) that contains the edg_jobIds to monitor. The format of the file must be as follows: one edg_jobId for each line and comment lines have to begin with a “#” or a “*” character. When using this option the user is requested for choosing among all, one or a subset of the listed job identifiers. If the file_path does not represent an absolute path, it will be searched in the current working directory. If the user wants to use his “private” configuration file, this can be done using option --config file_path. The same applies for the vo-specific configuration file and the --config-vo option.

IST-2000-25182 PUBLIC 115 / 146

Page 116: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

The --verbosity option allows setting the detail level of the returned information. This option can be specified with three values, 0, 1 and 2. The default level of verbosity is 0 unless otherwise specified in the UI configuration file $EDG_WL_LOCATION/etc/ edg_wl_ui_cmd_var.conf (DefaultStatusLevel parameter). Hereafter are listed the information displayed according to the verbosity level. Verbosity equal 0:

- edg_jobId (the job unique identifier) - Current Status (the job current status) - exit_code (Unix exit code – if applicable) - Status Reason (reason for being in this state) - Reached on (date/time when the job entered actual state) - destination (ID of CE where the job has been submitted – if applicable)

With verbosity equal 1, some additional information fields are added such as:

- cancelling (boolean indicating if a cancellation request for the job is in progress)

- cancelReason (Reason of cancel) - ce_node (Worker node where the job is executed) - children_hist (summary -- histogram -- of children job states) - children_num (number of subjobs) - subjob_failed (Subjob failed -- the parent job will fail too) - condorId (Id within Condor-G) - cpuTime (Consumed CPU time) - expectUpdate (Boolean indicating that some logged information has not

arrived yet) - expectFrom (Sources of the missing information) - jobtype (Type of the request: 0 = Job, 1 = DAG) - lastUpdateTime (Last known event of the job) - location (location Where the job is being processed) - network_server (Network server handling the job) - owner (certificate subject of Job owner) - resubmitted (boolean indicating that the job was resubmitted)

Lastly, with verbosity equal 2 there are the following additional fields:

- jdl (User submitted job description)

IST-2000-25182 PUBLIC 116 / 146

Page 117: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

- matched_jdl (Full job description after matchmaking) - condor_jdl (ClassAd passed to Condor-G for job submission) - rsl (Job RSL sent to Globus) - stateEnterTimes (When all previous states were entered)

Information fields that are not available (i.e. not returned by the LB because not applicable for the given status) are not printed at all to the user. The job Status possible values are reported in Annex 7.2. Details on the Job Status Diagram can be found in [A4]. OPTIONS --help displays command usage. --version displays UI version.

--all

displays status information about all job owned by the user submitting the command. This option can’t be used either if one or more edg_jobIds have been specified or if the --input option has been specified. All LBs listed in the vo-specific UI configuration file $EDG_WL_LOCATION/etc/<vo_name>/edg_wl_ui.conf are contacted to fulfil this request.

--input input_file -i input_file displays bookkeeping info about dg_jobIds contained in the input_files. When using this option the user is interrogated for choosing among all, one or a subset of the listed job identifiers. This option can’t be used either if one or more edg_jobIds have been specified or if the --all option has been specified. See 6.1.2 for details about this option. --verbosity verb_level --v verb_level

sets the detail level of information about the job displayed to the user. Possible values for verb_level are 0,1 and 2.

--config file_path

IST-2000-25182 PUBLIC 117 / 146

-c file_path

Page 118: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

if the command is launched with this option, the configuration file pointed to by file_path is used instead of the standard configuration file.

--config-vo file_path if the command is launched with this option, the vo-specific configuration file pointed to by file_path is used instead of the standard vo-specific configuration file. This option is allowed only when used together with the --all one.

--vo vo_name This option allows the user to specify the Virtual Organisation she/he is currently working for. If the user proxy contains VOMS extensions then the VO specified through this option is overridden by the default VO contained in the proxy (i.e. this option is only useful when working with non-VOMS proxies). The following precedence rule is followed for determining the user's VO: − the default VO from the user proxy (if it contains VOMS extensions), − the VO specified through the --vo or --config-vo options, − the VO specified in the configuration file pointed by the

EDG_WL_UI_CONFIG_VO environment variable, − the VirtualOrganisation attribute in the JDL (if the user proxy contains VOMS

extensions this value is overridden as above), − the default VO specified in the

$EDG_WL_LOCATION/etc/edg_wl_ui_cmd_var.conf (DefaultVo field) configuration file.

If none of the listed trials has success an error is returned and the submission is aborted. This option is allowed only when used together with the --all one.

--output file_path -o file_path

writes the bookkeping information in the file specified by file_path instead of the standard output. file_path can be either a simple name or an absolute path (on the submitting machine). In the former case the file file_path is created in the current working directory.

--noint

if this option is specified every interactive question to the user is skipped. All warning messages and errors (if any) are written to the file edg-job-status_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location of log file is configurable.

IST-2000-25182 PUBLIC 118 / 146

Page 119: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

--debug

when this option is specified, information about the API functions called inside the command are displayed on the standard output and are written to the file edg-job-status_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location of log file is configurable.

--logfile file_path when this option is specified, the command log file is relocated to the location pointed by file_path

edg_jobId

job identifier returned by edg-job-submit. Job identifiers must always be provided as last arguments of the command.

EXIT STATUS edg-job-status exits with a value of 0 if the status of all the specified jobs is retrieved correctly, >0 if errors occurred for each specified job id and <0 in case of partial failure. An example of partial failure is when more then one job is specified: status info could be successfully retrieved for some jobs and not retrieved for some others.

IST-2000-25182 PUBLIC 119 / 146

Page 120: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

EXAMPLES $> edg-job-status –v 0 https://ibm139.cnaf.infn.it:9000/_tO6hdgToYKGCuV68q-gqQ displays the following lines: *************************************************************

BOOKKEEPING INFORMATION:

Printing status info for the Job : https://ibm139.cnaf.infn.it:9000/_tO6hdgToYKGCuV68q-gqQ

Current Status: Scheduled

Destination: bbq.mi.infn.it:2119/jobmanager-pbs-dque

Status Reason: Job successfully submitted to Globus

reached on: Tue May 6 16:14:59 2003

*************************************************************

$>

SEE ALSO [A1], [A2], [A4], edg-job-submit.

IST-2000-25182 PUBLIC 120 / 146

Page 121: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

6.1.3.6. edg-job-get-logging-info Displays logging information about submitted jobs. SYNOPSIS edg-job-get-logging-info [options] <job Id(s)>

Options:

--help

--version

--input, -i <file_path>

--verbosity, -v <verbosity_value>

--config, -c <file_path>

--output, -o <file_path>

--noint

--debug

--logfile <file_path> DESCRIPTION This command queries the LB persistent DB for logging information about jobs previously submitted using edg-job-submit. The job logging information are stored permanently by the LB service and can be retrieved also after the job has terminated its life-cycle, differently from the bookkeeping information that are in some way “consumed” by the user during the job existence. The edg-job-get-logging-info request is sent to the LB service that queries the DB and returns the retrieved information. Content of the logging information varies according to the type of the event they are related to. The most common information fields are:

- Event (event type - possible event types are listed in Annex 7.3) - source (WMS component which generated the event) - result (result of the attempt) - destination (destination where the job is being transferred to) - timestamp (timestamp of event generation)

The --verbosity option allows setting the detail level of the returned information. This option can be specified with three values, 0, 1 and 2. The default level of verbosity is 0 unless otherwise specified in the UI configuration file $EDG_WL_LOCATION/etc/ edg_wl_ui_cmd_var.conf (DefaultLoggingLevel parameter). The information listed above is displayed when the chosen verbosity level is 0. If the command is issued with 1 as verbosity flag, then the following additional information is shown: IST-2000-25182 PUBLIC 121 / 146

Page 122: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

- host (hostname of the machine where the event was generated) - dest_host (destination hostname) - dest_instance (instance of destination WMS component) - user (identity -- cert. subj. -- of the generator) - dest_jobid (destination internal jobid) - node (worker node where the executable is run) - ns (Network server handling the job) - nsubjobs (number of subjobs) - local_jobid (new jobId assigned by the receiving component) - queue (destination queue name) - status_code (way of job termination/classification of the cancel)

Lastly if the command is issued with verbosity level 2, additional information mostly consisting in the job description within the WMS component that has logged the event, is printed to the user:

- jdl (job description) - job (job description in receiver language) - descr (description of current job transformation -- output of helper) - classad (checkpoint state value) - seqcode (sequence code assigned to the event) - level (logging level -- system, debug, ...)

Data on several jobs can be queried by specifying a list of job identifiers separated by a blank space as arguments of the command. Moreover the --input option permits to specify a file (file_path) which contains the edg_jobIds whose information are requested. The format of the file must be as follows: one edg_jobId for each line and comment lines have to begin with a “#” or a “*” character. When using this option the user is interrogated for choosing among all, one or a subset of the listed job identifiers. If the file_path does not represent an absolute path, it will be searched in the current working directory. Each event logged in the LB has an associated log level according to “Universal Format for Logger Messages” (see draft-abela-ulm-05.txt available at http://www-didc.lbl.gov/NetLogger/draft-abela-ulm-05.txt). Default value for the log level used by WMS components is System, anyway there could be special situations in which problems investigation is needed and additional events are logged with the Debug log level. The --output option can be used to have the retrieved information written in the file identified by file_path instead of the standard output. file_path can be either a simple name or an absolute path (on the submitting machine). In the former case the file file_path is created in the current working directory.

IST-2000-25182 PUBLIC 122 / 146

Page 123: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

If the user wants to use his “private” configuration file this could be done using option --config file_path. OPTIONS --help displays command usage. --version displays UI version. --input file_path -i file_path

retrieves logging info for all edg_jobIds contained in the file_path. This option can’t be used if one or more edg_jobIds have been specified. See 6.1.2 for details about this option.

--verbosity verb_level --v verb_level

sets the detail level of information about the job displayed to the user. Possible values for verb_level are 0,1 and 2.

--config file_path -c file_path

if the command is launched with this option, the configuration file pointed to by file_path is used instead of the standard configuration file.

--output file_path -o file_path

writes the logging information in the file specified by file_path instead of the standard output. file_path can be either a simple name or an absolute path (on the submitting machine). In the former case the file file_path is created in the current working directory.

--noint

if this option is specified every interactive question to the user is skipped. All warning messages and errors (if occurred) are written to the file edg-job-logging_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location for log file is configurable.

IST-2000-25182 PUBLIC 123 / 146

Page 124: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

--debug

when this option is specified, information about the API functions called inside the command are displayed on the standard output and are written to the file edg-job-logging_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location for log file is configurable.

--logfile file_path

when this option is specified, the command log file is relocated to the location pointed by file_path

edg_jobId

job identifier returned by edg-job-submit. Job identifiers must always be provided as last arguments for this command.

EXIT STATUS edg-job-get-logging-info exits with a value of 0 if the status of all the specified jobs is retrieved correctly, >0 if errors occurred for each specified job and <0 in case of partial failure. An example of partial failure is when more then one job is specified: some job’s logging info could be successfully retrieved and some others could be not retrieved. EXAMPLES 1. $> edg-job-get-logging-info \

https://ibm139.cnaf.infn.it:9000/GMUJtnNqe6Lq7w7MfOzeQw –output mylog.txt writes in file mylog.txt in the current working directory logging information about the job identified by https://ibm139.cnaf.infn.it:9000/GMUJtnNqe6Lq7w7MfOzeQw. 2. $> edg-job-get-logging-info –v 0 –input $HOME/myIds.txt where $HOME/myjobs.txt contains two job identifiers, displays the following output ------------------------------------------------------------------------------------------------------- 1 : https://ibm139.cnaf.infn.it:9000/D4S_i25ffAsPnKB3iCqeaA 2 : https://ibm139.cnaf.infn.it:9000/2qzyCbPWr7pDY3rNh9PuXA a : all q : quit ------------------------------------------------------------------------------------------------------- Choose one or more edg_jobId(s) in the list - [1-2]all: 2

IST-2000-25182 PUBLIC 124 / 146

Page 125: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

********************************************************************** LOGGING INFORMATION: Printing info for the Job : https://ibm139.cnaf.infn.it:9000/2qzyCbPWr7pDY3rNh9PuXA --- Event: RegJob - source = UserInterface - timestamp = Wed May 14 10:55:35 2003 --- Event: Transfer - destination = NetworkServer - result = START - source = UserInterface - timestamp = Wed May 14 10:55:36 2003 --- Event: Transfer - destination = NetworkServer - result = OK - source = UserInterface - timestamp = Wed May 14 10:55:44 2003 --- Event: Accepted - source = NetworkServer - timestamp = Wed May 14 10:56:42 2003 --- Event: EnQueued - result = OK - source = NetworkServer - timestamp = Wed May 14 10:56:45 2003 ********************************************************************** … …

SEE ALSO [A2], [A4], edg-job-submit.

IST-2000-25182 PUBLIC 125 / 146

Page 126: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

6.1.3.7. edg-job-attach This commands starts an interactive session for a previously submitted interactive job. SYNOPSIS edg-job-attach [options] <job Id> Options:

--help

--version

--port, -p <port_num>

--nogui

--nolisten

--config, -c <file_path>

--input, -i <file_path>

--noint

--debug

--logfile <file_path>

DESCRIPTION This command starts a listener process on the UI machine (grid_console_shadow) that allows attaching to the standard streams of a previously submitted interactive job and displays them on a dedicated window. As the command opens a X window, the user should make sure the DISPLAY environment variable is correctly set, a X server is running on the local machine and if she/he is connected to the UI node from remote machine (e.g. with ssh) enable secure X11 tunneling. The listener process and the window are started automatically by the edg-job-submit command for interactive jobs, so this command can be used for example in case a problem occurred on the UI machine that made the interactive session be lost or in case the user needs to follow the job from another machine or another port on the same machine (--port option). This command can only be invoked for interactive jobs. OPTIONS --help displays command usage. --version

IST-2000-25182 PUBLIC 126 / 146

Page 127: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

displays UI version. --port port_num -p port_num

make sthe command start a listener on the local machine on the specified port and logs these information to the LB associated to the job.

--nogui

As the edg-job-attach command opens a X window, the user should make sure a X server is running on the local machine and if she/he is connected to the UI node from remote machine (e.g. with ssh) enable secure X11 tunneling. If this is not possible, the user can specify the --nogui option that makes the command provide a simple standard non-graphical interaction with the running job.

--nolisten

This option makes the command forward the job standard streams coming from the WN to named pipes on the UI machine whose names are returned to the user together with the OS id of the listener process. This allows the user to interact with the job through her/his own tools. It is important to note that when this option is specified, the UI has no more control over the launched listener process that has hence to be killed by the user (through the returned process id) once the job is finished.

--config file_path -c file_path

if the command is launched with this option, the configuration file pointed to by file_path is used instead of the standard configuration file.

--input file_path -i file_path

allows the user to attach to one (just one) of the edg_jobIds contained in the file_path. This option can’t be used if one edg_jobIds has been specified. See 6.1.2 for details about this option.

--noint

if this option is specified every interactive question to the user is skipped. All warning messages and errors (if occurred) are written to the file edg-job-attach_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location for log file is configurable.

--debug

when this option is specified, information about the API functions called inside the command are displayed on the standard output and are written to the file edg-job-

IST-2000-25182 PUBLIC 127 / 146

Page 128: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

attach_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location for log file is configurable.

--logfile file_path

when this option is specified, the command log file is relocated to the location pointed by file_path

edg_jobId

job identifier returned by edg-job-submit. Job identifiers must always be provided as last arguments for this command.

EXIT STATUS edg-job-attach exits with a value of 0 on success and >0 on failure. EXAMPLES $> edg-job-attach https://ibm139.cnaf.infn.it:9000/t3KwW8qhXhkYs-ZfNCFidg displays the following information message: **********************************************************************

JOB ATTACHED:

The Interactive Session Listener has been successfully launched

with the following parameters:

---

Host: 10.1.1.90

Port: 40713

Pid: 18575

**********************************************************************

and opens a window allowing interaction with the job through the standard streams.

6.1.3.8. edg-job-get-chkpt This commands retrieves checkpoint states saved by a previously submitted checkpointable job. SYNOPSIS edg-job-get-chkpt [options] <job Id> Options:

--help

IST-2000-25182 PUBLIC 128 / 146

Page 129: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

--version

--cs <state_num>

--config, -c <file_path>

--input, -i <file_path>

--output, -o <file_path>

--noint

--debug

--logfile <file_path>

DESCRIPTION This commands allows the user to retrieve one or more checkpoint states saved by a previously submitted job. Checkpoint states are retrieved from the LB server and are saved locally into a file in JDL format. The --cs option allows the user to select the checkpoint state she/he wants to be retrieved. Indeed specifying the command with “--cs N” makes the command retrieve the last but N job checkpoint state. Last saved state is retrieved otherwise. The retrieved state is saved in a file in JDL format. The output file path can be set through the --output option of the command. This command can be used only for checkpointable jobs. OPTIONS --help displays command usage. --version displays UI version. --config file_path -c file_path

if the command is launched with this option, the configuration file pointed to by file_path is used instead of the standard configuration file.

--cs state_num

if the command is launched with this option then it retrieves the “last but state_num” state saved by the job. Last saved state is returned if the option is not used (equivalent to state_num = 0).

--input file_path

IST-2000-25182 PUBLIC 129 / 146

Page 130: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

-i file_path

allows the user to select one (just one) of the edg_jobIds contained in the file_path for retrieval of the saved checkpoint state. This option can’t be used if one edg_jobIds has been specified. See 6.1.2 for details about this option.

--output file_path -o file_path

saves the retrieved state in the file specified by file_path. file_path can be either a simple name or an absolute path (on the submitting machine). In the former case the file file_path is created in the current working directory. If this option is not used the retrieved state is displayed on the standard output.

--noint

if this option is specified every interactive question to the user is skipped. All warning messages and errors (if occurred) are written to the file edg-job-get-chkpt_<UID>_<PID>_<timestamp>.log under the /tmp directory. Location for log file is configurable

--debug

when this option is specified, information about the API functions called inside the command are displayed on the standard output and are written to the file edg-job-attach_<UID>_<PID>_<timestamp>.log under the /tmp directory too. Location for log file is configurable.

--logfile file_path

when this option is specified, the command log file is relocated to the location pointed by file_path

edg_jobId

job identifier returned by edg-job-submit. Job identifiers must always be provided as last arguments for this command.

EXIT STATUS edg-job-get-chkpt exits with a value of 0 on success and >0 on failure. EXAMPLES The following commands retrieve the last but 3 saved checkpint state of the job and saves it in the file specified by the user : $> edg-job-get-chkpt -o myjob.chk -cs 3 https://ibm139.cnaf.infn.it:9000/LNn4rOX17LL30e34hSqGjQ IST-2000-25182 PUBLIC 130 / 146

Page 131: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

======================= edg-job-get-chkpt Success =======================

The checkpointable Job state has been successfully retrieved from LB

Server and stored in the file: /home/fpacini/CLI/bin/myjob.chk

=========================================================================

$> more /home/fpacini/CLI/bin/myjob.chk # Job State Retrieved for

#edg_jobId: https://ibm139.cnaf.infn.it:9000/LNn4rOX17LL30e34hSqGjQ

[

UserData =

[

distribution = false;

hsum_filename =

"gsiftp://lxde01.pd.infn.it/tmp/root_test/hsum_lxde04_1200000.root";

first_event = 1200001

];

IST-2000-25182 PUBLIC 131 / 146

Page 132: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

7. ANNEXES

7.1. JDL ATTRIBUTES The JDL is a fully extensible language (i.e. it does not rely on a fixed schema), hence the user is allowed to use whatever attribute for the description of a job without incurring in errors. Anyway only a certain set of attributes (that we will refer to as “supported” attributes) can be taken into account by the WMS components for scheduling and submit a job. The “supported” attributes, their meaning and the way to use them to describe a job are dealt in detail in document [A1].

7.2. JOB STATUS DIAGRAM The following reports the status that a job can assume during its life cycle.

IST-2000-25182 PUBLIC 132 / 146

Page 133: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Figure 2 Job Life Cycle

Job status in Figure 2 are briefly described hereafter (see [A4] for further details): STATUS: - SUBMITTED: job is entered by the user to the User Interface but not yet transferred to

Network Server for processing. - WAITING: job has been accepted by NS and is waiting for Workload Manager processing

or is being processed by WM Helper modules (e.g., WM is busy, no appropriate Computing Element (cluster) has been found yet, required dataset is not available, job is waiting for resource allocation).

- READY: job has been processed by WM and its Helper modules (especially, appropriate Computing Element has been found) but not yet transferred to the Computing Element (local batch system queue) via Job Controller and Condor-G.

IST-2000-25182 PUBLIC 133 / 146

Page 134: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

- SCHEDULED: job is waiting in the queue on the Computing Element. - RUNNING: job is running. - DONE: job exited or is considered to be in a terminal state by Condor-G (e.g., submission

to CE has failed in an unrecoverable way). - ABORTED: job processing was aborted by WMS (waiting in the Workload Manager

queue or Computing Element for too long, over-use of quotas, expiration of user credentials, etc.).

- CANCELLED: job has been successfully cancelled on user request. - CLEARED: output sandbox was transferred to the user or removed due to the timeout.

IST-2000-25182 PUBLIC 134 / 146

Page 135: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

7.3. JOB EVENT TYPES Hereafter is reported the list of job event types that could be returned to the user by the edg-job-get-logging-info command. They are organized in several categories: • Events concerning a job transfer between components: JobTransfer A component generates this event when it tries to transfer a job to

some other component via network interface (protocol). This event contains the identification of the receiver and possibly the job description expressed in the language accepted by the receiver. The result of the transfer, i.e. success or failure, as seen by the sender is also included.

JobAccepted A component generates this event when it receives a job from another WMS component. This event contains also the locally assigned job identifier.

JobRefused Receiving component could not accept the job, the reason being a part of the event.

JobEnqueue The job is inserted into a queue, e.g., the queue holding the job after it is received by Network Server and before it is processed by Workload Manager.

JobDequeue The job is removed from queue. HelperCall Helper component is called during the job processing. The type-

specific data include the name of called Helper, whether the logging component is called or calling one, and optionally parameters passed to the Helper.

HelperReturn Call to Helper returned.

Events concerning a job state change during processing within a component: •

JobAbort The job processing is stopped by WMS due to error condition, the event contains the reason for abort.

JobRun The job is started on a CE. JobDone Job has exited, has been successfully cancelled or is considered to be

in terminal state by Condor-G. JobResub The result of resubmission decision after the job has failed. JobCleared The user has successfully retrieved the job results, e.g. the output files

specified in the output sandbox, or the job results has been deleted due to time limit.

JobCancel Cancel operation has been attempted on the job. JobPurge The job was purged from bookkeeping server's database. This event is

stored only in a logging server.. • Events associated with the Workload Manager or Helper modules: JobMatch An appropriate match between a job and a Computing Element has

been found. The event contains the identifier of the selected CE.

IST-2000-25182 PUBLIC 135 / 146

Page 136: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

JobPending A match between a job and a suitable Computing Element was not

found, so the job is kept pending by the WM. The event contains the reason why no match was found.

• Events used to store special information in logging and bookkeeping services: JobRegister Logged by job creator (User Interface) in order to register the job with

bookkeeping server. JobChkpt An application-specific checkpoint was created (logged by

checkpointing API). Checkpoint tag and ClassAd strings should be included. JobListener Used by UI to store listener network port information for interactive

jobs. Listener port number, hostname and service name (multiple ports can be advertised) are included.

JobCurJdl This optional event can be used to report ClassAd describing the current state of job processing (output from Helper modules).

More details on job event types can be found in [A4].

IST-2000-25182 PUBLIC 136 / 146

Page 137: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

7.4. SUBMISSION FAILURES ANALYSIS Analysis of failed job’s state can be carried out through the check of the consistency and completeness of the job related events returned by the edg-job-get-logging-info command. A further verification if needed, should be then performed on the retrieved output files produced by the jobs (if any) and through the inspection of the log files ad debugging information traced by the various system components. As explained in section 6.1.3.6 to get the logging information about a job you need to issue the following command:

edg-job-get-logging-info <job_Identifier>

Since the output of the command could be copious we advice usage of the –output option too to redirect it to a given file: edg-job-get-logging-info –output <my_file> <job_Identifier> Using the –full option allows then to get more detailed information (the job descriptions at the various stages are also included): edg-job-get-logging-info –v 2 –output <my_file> <job_Identifier> Before using the edg-job-get-logging-info command, it is in some cases useful a check to the edg-job-status output that can contain information about the cause of a job failure. As explained in section 6.1.3.5 to get the status information about a job you need to issue the following command: edg-job-status <job_Identifier>

As said at the beginning of this section another way for analysing submission failures is to inspect the standard output and error of the job generated on the Worker Node and retrieved on the UI machine through the edg-job-get-output command. A typical example of errors that can be detected in this way is when the users submits a script that in turn tries to start enother script or an executable. E.g. the submitted scripts is like: #!/bin/sh # Use the coincidence file to compare the meaurements curdir=`pwd` ${curdir}/lecture_new_gome_V2_sel1_PT_10 idl appli.pro Upon job abortion, the error message received through the OutputSandbox retrieval is:

IST-2000-25182 PUBLIC 137 / 146

Page 138: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

./demo_june: /home/eo004/3042/lecture_new_gome_V2_sel1_PT_10: Permission denied

The reason for this error is that globus-url-copy (used for the InputSanbox files staging) in general doesn't preserve the x flag so the script specified as Executable in the JDL (on which chmod +x is done automatically by the WP1 JobWrapper), should perform a chmod +x for all the executable files (lecture_new_gome_V2_sel1_PT_10 in this example) transferred within the InputSandbox of the job.

IST-2000-25182 PUBLIC 138 / 146

Page 139: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

7.5. JOB RESUBMISSION AND RETRYCOUNT It is important to note that there are particular cases, as for example temporary network outages in the proximity of the CE, that can make the WMS “think” the job has failed (no way currently to distinguish such situations by means of error reporting from the underlying components) and hence trigger job resubmission whilst the job is running on the WN. This can cause having one or even more copies of the same job running on different CEs since until the network is down the WMS is not able to kill the original job. Due to above mentioned possibility (although should occur very rarely) it is advisable for jobs performing sensitive operations (e.g. committing data into a DB) to disable the WMS re-submission feature. This can be easily done on a per-job basis setting to 0 the value of the RetryCount attribute in the job description and on a “per-session” basis setting to 0 the value of the RetryCount parameter in the UI configuration (see 4.5.3.1 and 4.5.3.2).

7.6. WILDCARD PATTERNS The wildcard patterns that can be included in the InputSandbox attribute expression are used by the UI to perform file name “globbing” in a fashion similar to the UNIX csh shell. The result of the “globbing” is a list of the files whose names match any of the specified patterns. The admitted special characters together with their meaning are listed hereafter:

- * wildcard for any string - ? wildcard for any single character - [chars ] delimits a wildcard matching any of the enclosed characters. If chars

contains a sequence of the form a-b then any character between a and b (inclusive) will match. Such an expression can be negated by means of the special character “!” ([!chars] matches any character not in chars).

EXAMPLES Consider a directory where “ls –F” gives: 1file a1.f apple.o bob.o h4374.f john.o 2files ab apps/ foo.c h4374.o mydir/ ABS ab.f bob foo.f john stuff/ a1 apple.f bob.f gh john.f

That is to say some files and directories. The examples below show the way the mentioned wildcards are expanded (the notation => indicates the result of typing the command). 1) Every two letter file name:

echo ?? => a1 ab gh

IST-2000-25182 PUBLIC 139 / 146

Page 140: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

2) Every two character name starting with “a“:

echo a? => a1 ab 3) Every file starting with j, o, h, or n:

echo [john]* => h4374.f h4374.o john john.f john.o

4) Include a range, e.g. everything starting with an upper case letter or a digit: echo [A-Z0-9]* => 1file 2files ABS

5) Negate a range: echo [!john]*.f => a1.f ab.f apple.f bob.f foo.f

6) Every file starting in “a” and ending in .f:

echo a*.f => a1.f ab.f apple.f

IST-2000-25182 PUBLIC 140 / 146

Page 141: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

7.7. THE MATCH MAKING ALGORITHM The main task performed by the RB (aka Matchmaker) is to find the best suitable Computing Element where to execute the job. In order to accomplish this task the RB interacts with the other EDG components. More precisely, the Replica Location Service (RLS) and the GOUT/Information Index (II) are the two main components which supply the RB with all the information required for the actual resolution of the matches between job requirements and Computing Element capabilities (i.e. runtime environments, data access features, processing resources etc.). The following sections provide a description of the matchmaking algorithm performed by the RB. At this aim it is worth to identify three different scenarios to be dealt with separately: − direct job submission, − job submission without data-access requirements, − job submission with data-access requirements.

7.7.1. Direct Job Submission The simplest scenario is to consider the case where the JDL submitted by the UI contains a link to the resource to submit the job at, i.e. the Computing Element identifier (CEId). In this case the RB doesn’t perform any matchmaking algorithm at all, but simply the job is submitted to the specified CE.

WM - RB

WAN / LAN

UI

WAN / LAN

lxde01.pd.infn.it:2119/jobmanager-lsf-grid01

JDL JDL JDL Job

CE = lxde01.pd.infn.it:2119/jobmanager-lsf-grid01

Figure 3 - Submission with specified CEId

It should be pointed out that, if the CEId is specified then the RB neither checks whether the user who submitted the job is authorised to access the given CE, nor interacts with the RLS for the resolution of files requirements, if any. The only check performed by the RB is the JDL syntax one, while converting the JDL into a ClassAd.

7.7.2. Job submission without data-access requirements Let’s do a little step onwards and consider the scenario where the user specifies a job with given execution requirements, but without data constraints. Once the JDL has been received by the RB and successfully converted into ClassAd (job-ad) the RB starts the actual match-making algorithm to find if the characteristics and status of Grid resources match the job requirements.

IST-2000-25182 PUBLIC 141 / 146

Page 142: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

The matchmaking algorithm consists of two different phases: requirements check and the rank computation. During the requirements check phase the RB contacts the GOUT/II in order to create a set of suitable CEs to execute the job at, thus compliant with user requirements and where the user is authorized to submit jobs, as well. Taking into account that all the CE attributes involved in the JDL requirements (defined by the user to express his/her needs) usually refers to “static” information, (such as operating system, installed software runtime environments, etc.), it is clear that all the information cached in the GOUT/II represent a good source for testing matches between job requirements and CE features. It is clearly more efficient than contacting each CE to find out the same information.

WM - RBWAN / LAN

UI

JDL

JDL

[...requirements = other.GlueCEInfoLRMSType == "pbs" &&member(“CMS3.2”, other.GlueHostApplicationSoftwareRunTimeEnvironment);rank = other.GlueCEStateFreeCPUs;...]

Data

II / GOUT

WAN / LAN

Retrieves information about CEs

skurut.cesnet.cz:2119/jobmanager-pbs-wp1bbq.mi.infn.it:2119/jobmanager-pbs-dqueSuitable CEs

JDL

Figure 4 - Requirements checking phase

Once the RB has created the set of the suitable CEs where the job can be executed, the RB performs the second phase of the matchmaking algorithm, which allows the RB to acquire information about the “quality” of the just found suitable CEs. In the ranking phase the RB contacts directly the LDAP server (i.e the GRIS) of the involved CEs to obtain the values of those attributes appearing in the rank expression of the received JDL. It should be pointed out that conversely to the previous phase, it is better to contact each suitable CE, rather than using the GOUT/II as source of information, since the rank attributes usually refers to variables varying in time very frequently (i.e. FreeCPUs, FreeMemory).

IST-2000-25182 PUBLIC 142 / 146

Page 143: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

If there are two or more CEs that meet all the requirements and have the same best rank, then the CE is chosen among them in a random way (and all these CEs have the same probability to be chosen). The default policy the RB adopts while performing the matchmaking algorithm is to select the CEs at maximum rank value. Therefore, the higher is the frequency at which variables involved in the rank expression change their values the higher the probability the matchmaking algorithm yields different CEs. As explained in [A1], it is also possible to enable “fuzzyness” in the matchmaking, i.e. to force matchmaking algorithm to adopt a stochastic selection criteria while searching for the best matching CE. This can be done specifying the following attribute

FuzzyRank = true; in the submitted JDL. In this case, rank values associated to each matching CE represent the probability that each CE has, to be selected as the best matching one. Namely, the higher is the probability to be selected the higher the rank value. Rank computation is depicted in Figure 5.

WM - RBWAN / LAN

UI

JDL

JDL

[...requirements = other.GlueCEInfoLRMSType == "pbs" &&member(“CMS3.2”, other.GlueHostApplicationSoftwareRunTimeEnvironment);rank = other.GlueCEStateFreeCPUs;...]

Data

II / GOUT

WAN / LAN

Retrieves information about CEs

skurut.cesnet.cz:2119/jobmanager-pbs-wp1bbq.mi.infn.it:2119/jobmanager-pbs-dqueSuitable CEs

JDL

Figure 5 – Rank computation phase

IST-2000-25182 PUBLIC 143 / 146

Page 144: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

7.7.3. Job submission with data-access requirements The Resource Broker interacts with the WP2 Replica Management services in order to find out the most suitable Computing Element taking into account the Storage Elements where both input data sets are physically stored and output data sets should be staged on completion of job execution. Before describing the action taken by the RB upon reception of a JDL where both data-access and computing requirements are present, it is worth to recall the JDL attributes which represent a data requirement at the RB side: OutputSE, InputData and DataAccessProtocol, respectively representing the Storage Element (SE) where the output file should be staged, the input files (LFNs, GUIDs) required as input for the actual job execution and the protocol “spoken” by the application to access such files. The main two phases of the match making algorithm performed by the RB remain unchanged, but the RB executes the requirements check and ranking for each class of CEs satisfying the data-access requirements. Additionally, the RB performs a pre-match processing to find out and classify those CEs satisfying both data-access and user authorisation requirements. During the pre-match processing phase the RB contacts the RLS in order to resolve logical file names and collect all the information about SEs containing at least one input data file. This information will be used to write down the broker-info-file, which will be shipped, within the input sandbox, to the WN where the execution will take place. At this point the RB is ready to start the CEs classification procedure, during which the RB contacts the II/GOUT in order to find the CEs satisfying both the authorization requirements and having the OutputSE “close” to them. Using the information retrieved during the file name resolution, the RB classifies those CEs depending on the number of input files stored in storage element(s) which is (are) close to the CE itself and speak at least one of the protocols specified in the DataAccessProtocol JDL attribute.

IST-2000-25182 PUBLIC 144 / 146

Page 145: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

ClassifyCEOnDataAccess

Retirieve SEs information and resolveLFN -> PFN mapping if so required

While exists a ComputingElement (CEId) such as AuthorizedUser = JDL.CertificateSubject

JDL.OutputSE is aCloseSE for CEId ?

No

For each StorageElement close to CEId

Retirieve the MountPoint and conunt thenumber N of distinguished InputData

files it supplies access with.

For

Does at least oneSEProtocol

for such SE belong toDataAccessProtocol ?

While

End

Files_x_CE[CEId] =files.size()

Yes

Yes

No

Figure 6 - CEs classification procedure

Upon completion of the CE data classification, the RB is ready for the actual match making and starts the requirements checking phase for each CE belonging to the first non-empty class of CEs, which can access the highest number of distinguished files. If a CE doesn’t satisfy the user requirements it is removed from its class. The requirements checking phase is repeated until at least a CE matching the user requirement is found. Once the requirements checking phase is completed either the RB knows a set of CEs satisfying both data-access and computing requirements having access to the maximum number of distinguished input files, or there does not exist a suitable CE matching such requirements. In the first case the RB starts the ranking phase in order to find the best CE to which submitting the job.

IST-2000-25182 PUBLIC 145 / 146

Page 146: WMS SW Admin and User Guideserver11.infn.it/workload-grid/docs/DataGrid-01-TEN-0118-1_2.pdf · − Take into account changes in the rpm generation procedure. − Add missing info

Doc. Identifier:DataGrid-01-TEN-0118-1_2

WP1 - WMS SOFTWARE

ADMINISTRATOR AND USER GUIDE

Date: 24/11/2003

Start

max_files =max(Files_x_CE)

do

theJob.SuitableCEs =CEs with max_files

check requirem ents

while theJob.SuitableCEs.empty AND --max_files

During the checkrequirements phase the CEwhich do not m atch the jobrequirem ents are rem ovedform theJob.SuitableCEs

m ax_files > 0

Yes

compute rank choose the highestranked CE

No matchingresource found!

End

No

Figure 7 - Match-Making algorithm

A special case is when the getAccessCost method has been specified as rank (see [A1]), i.e.:

Rank = other.DataAccessCost;

in this case the CE is chosen among the CEs satisfying the no-data requirements (i.e. the ones specified in the Requirements JDL expression) and where the user is allowed to submit jobs, and the choice of the “best” CE among them is delegated to the getAccessCost function, i.e. the CE from which access to data is the lowest is the “best” one.

IST-2000-25182 PUBLIC 146 / 146