Grid Workload Management Massimo Sgaravatto INFN Padova.

18
Grid Workload Management Massimo Sgaravatto INFN Padova

Transcript of Grid Workload Management Massimo Sgaravatto INFN Padova.

Page 1: Grid Workload Management Massimo Sgaravatto INFN Padova.

Grid Workload Management

Massimo SgaravattoINFN Padova

Page 2: Grid Workload Management Massimo Sgaravatto INFN Padova.

Grid Workload Management WP Goal: define and implement a suitable architecture for

distributed scheduling and resource management in a GRID environment

Large heterogeneous environment Large numbers (thousands) of independent users

Many challenging issues : Optimizing the choice of execution location based on the

availability of data, computation and network resources Uniform interface to possible different local resource

management systems under different administrative domains Priorities, policies on resource usage Reliability, scalability, … …

http://www.infn.it/workload-grid

Page 3: Grid Workload Management Massimo Sgaravatto INFN Padova.

Approach We need much more experience

with the various grid issues The application requirements are

not completely defined yet. They will evolve as more familiarity with the grid model is acquired

Fast prototyping instead of a classic top-down approach

Page 4: Grid Workload Management Massimo Sgaravatto INFN Padova.

Current activities Report on current technology on Grid

scheduling and resource management Globus resource management Condor Survey on Grid scheduling systems

Focus on the implementation of a first prototype workload management system This part will be plugged together with the other

parts implemented by the other WP’s to form the project month 9 (September) deliverable

Grid accounting

Page 5: Grid Workload Management Massimo Sgaravatto INFN Padova.

Functionalities foreseen for the 1st release First version of job description

language (JDL) First version of resource broker Job submission service First version of bookkeeping and

logging services First user interface

Page 6: Grid Workload Management Massimo Sgaravatto INFN Padova.

Block diagram of the currently foreseen components of the workload management system Not a real architecture Functional interactions among the

various components Dependencies on “external”

functionalities

Page 7: Grid Workload Management Massimo Sgaravatto INFN Padova.
Page 8: Grid Workload Management Massimo Sgaravatto INFN Padova.

Job Description Language (JDL) First release of job description language

(JDL) used when the job is submitted, to specify the job characteristics (application, input data set id, resources [required and preferable], …)

A document describing the syntax and semantics of a “prototype” JDL, based on Condor ClassAds was prepared Ready to collect feedback from applications

Page 9: Grid Workload Management Massimo Sgaravatto INFN Padova.

Resource Broker First version of resource broker, that chooses the

computing resources (queues or “single” nodes) where to submit jobs, considering

Access policies (grid-mapfiles in the Globus based prototype) Characteristics and status of resources Availability of input data set Availability of the required run time/application environments

Resources required specified in the JDL Resources required published in an Information Space

(Globus GIS in the first prototype) + Replica Catalog

Ongoing implementation based on the Condor matchmaking library (Salvatore’s presentation)

Page 10: Grid Workload Management Massimo Sgaravatto INFN Padova.

Information Service All the information needed by the broker

published in one Grid Information Space (Globus GIS/MDS for the first release)

New MDS 2 alpha release soon available Should address some of the existing

shortcomings Necessary to implement plug in modules

Index (for a first level query, to identify a set of candidate resources)

Information providers (to publish needed information about resources)

Page 11: Grid Workload Management Massimo Sgaravatto INFN Padova.

Job submission service Job submission service based (for the first

release) on: Globus GRAM Condor-G on top of Globus GRAM (to implement a

reliable job submission service) Globus GRAM

Comprehensive evaluation already done (collaboration with the “Evaluation of the Globus toolkit” WP)

Globus GRAM as uniform interface to different underlying resource management system (LSF, Condor, PBS)

GRAM reporter (GRAM – GIS interaction) RSL

Page 12: Grid Workload Management Massimo Sgaravatto INFN Padova.

Job submission service Condor-G

First prototype implementation already tested Promising, but many problems to fix

New Condor-G implementation under testing Many problems fixed, but still other open issues

Other new Condor-G implementation released hopefully in a few weeks

Exploitation of a new persistent Globus jobmanager

Active in following the developments of Globus GRAM, Condor-G, implementing the required customizations

Page 13: Grid Workload Management Massimo Sgaravatto INFN Padova.

Bookkeeping & Logging Job monitoring and control

Job status Used resources Start time End time …

Record of significant events occurring in the workload management system

Page 14: Grid Workload Management Massimo Sgaravatto INFN Padova.

User interface Command-line, for job management

operations List of resources “suitable” to run a job Job submission (with the possibility to

specify where to submit the job, or leaving this choice to the broker)

Job status monitoring Job removal Access to bookkeeping info for the job

Page 15: Grid Workload Management Massimo Sgaravatto INFN Padova.

Workload management system (1st prototype)

GlobusGRAM

CONDOR

GlobusGRAM

LSF

GlobusGRAM

PBS

Site1Site2 Site3

Job submissionserviceCondor-G

Broker GIS + Replica Catalog

Submit jobs(using JDL [Class-Ads])

ResourceDiscovery

Information on characteristics andstatus of local resources

LocalResource

ManagementSystems

Globus GRAMas uniform interface

to different local resource management systems

Condor-G able toprovide a

reliable/crash-proof job submission service

Broker chooses in whichGlobus resources the jobs

must be submitted

Farms

Other info

Page 16: Grid Workload Management Massimo Sgaravatto INFN Padova.

Grid Accounting New problem

Working systems (even prototype implementations) don’t exist yet

Economy-based model for Grid accounting ?

See Stefano’s presentation

Page 17: Grid Workload Management Massimo Sgaravatto INFN Padova.

Deliverables foreseen in the INFN-GRID proposal D2.1.1 Technical assessment about Globus

and Condor, interactions and usage (5/2001) Done

D2.1.2 First resource broker implementation for high throughput applications (7/2001) The resource broker should be easily

customizable for high throughput applications Usable after M9 release

Page 18: Grid Workload Management Massimo Sgaravatto INFN Padova.

Deliverables foreseen in the INFN-GRID proposal D2.1.3 Comparison of different local

resource managers (10/2001) Condor, LSF, PBS Farms with these resource management

systems already in place and instrumented with the Globus software

D2.1.4 Study of the three workload systems and implementation of the workload system for Monte Carlo productions (12/2001) Should be achievable