Grid Workload Management Massimo Sgaravatto INFN Padova.
-
Upload
frederica-maxwell -
Category
Documents
-
view
213 -
download
0
Transcript of Grid Workload Management Massimo Sgaravatto INFN Padova.
![Page 1: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/1.jpg)
Grid Workload Management
Massimo SgaravattoINFN Padova
![Page 2: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/2.jpg)
Grid Workload Management WP Goal: define and implement a suitable architecture for
distributed scheduling and resource management in a GRID environment
Large heterogeneous environment Large numbers (thousands) of independent users
Many challenging issues : Optimizing the choice of execution location based on the
availability of data, computation and network resources Uniform interface to possible different local resource
management systems under different administrative domains Priorities, policies on resource usage Reliability, scalability, … …
http://www.infn.it/workload-grid
![Page 3: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/3.jpg)
Approach We need much more experience
with the various grid issues The application requirements are
not completely defined yet. They will evolve as more familiarity with the grid model is acquired
Fast prototyping instead of a classic top-down approach
![Page 4: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/4.jpg)
Current activities Report on current technology on Grid
scheduling and resource management Globus resource management Condor Survey on Grid scheduling systems
Focus on the implementation of a first prototype workload management system This part will be plugged together with the other
parts implemented by the other WP’s to form the project month 9 (September) deliverable
Grid accounting
![Page 5: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/5.jpg)
Functionalities foreseen for the 1st release First version of job description
language (JDL) First version of resource broker Job submission service First version of bookkeeping and
logging services First user interface
![Page 6: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/6.jpg)
Block diagram of the currently foreseen components of the workload management system Not a real architecture Functional interactions among the
various components Dependencies on “external”
functionalities
![Page 7: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/7.jpg)
![Page 8: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/8.jpg)
Job Description Language (JDL) First release of job description language
(JDL) used when the job is submitted, to specify the job characteristics (application, input data set id, resources [required and preferable], …)
A document describing the syntax and semantics of a “prototype” JDL, based on Condor ClassAds was prepared Ready to collect feedback from applications
![Page 9: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/9.jpg)
Resource Broker First version of resource broker, that chooses the
computing resources (queues or “single” nodes) where to submit jobs, considering
Access policies (grid-mapfiles in the Globus based prototype) Characteristics and status of resources Availability of input data set Availability of the required run time/application environments
Resources required specified in the JDL Resources required published in an Information Space
(Globus GIS in the first prototype) + Replica Catalog
Ongoing implementation based on the Condor matchmaking library (Salvatore’s presentation)
![Page 10: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/10.jpg)
Information Service All the information needed by the broker
published in one Grid Information Space (Globus GIS/MDS for the first release)
New MDS 2 alpha release soon available Should address some of the existing
shortcomings Necessary to implement plug in modules
Index (for a first level query, to identify a set of candidate resources)
Information providers (to publish needed information about resources)
![Page 11: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/11.jpg)
Job submission service Job submission service based (for the first
release) on: Globus GRAM Condor-G on top of Globus GRAM (to implement a
reliable job submission service) Globus GRAM
Comprehensive evaluation already done (collaboration with the “Evaluation of the Globus toolkit” WP)
Globus GRAM as uniform interface to different underlying resource management system (LSF, Condor, PBS)
GRAM reporter (GRAM – GIS interaction) RSL
![Page 12: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/12.jpg)
Job submission service Condor-G
First prototype implementation already tested Promising, but many problems to fix
New Condor-G implementation under testing Many problems fixed, but still other open issues
Other new Condor-G implementation released hopefully in a few weeks
Exploitation of a new persistent Globus jobmanager
Active in following the developments of Globus GRAM, Condor-G, implementing the required customizations
![Page 13: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/13.jpg)
Bookkeeping & Logging Job monitoring and control
Job status Used resources Start time End time …
Record of significant events occurring in the workload management system
![Page 14: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/14.jpg)
User interface Command-line, for job management
operations List of resources “suitable” to run a job Job submission (with the possibility to
specify where to submit the job, or leaving this choice to the broker)
Job status monitoring Job removal Access to bookkeeping info for the job
![Page 15: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/15.jpg)
Workload management system (1st prototype)
GlobusGRAM
CONDOR
GlobusGRAM
LSF
GlobusGRAM
PBS
Site1Site2 Site3
Job submissionserviceCondor-G
Broker GIS + Replica Catalog
Submit jobs(using JDL [Class-Ads])
ResourceDiscovery
Information on characteristics andstatus of local resources
LocalResource
ManagementSystems
Globus GRAMas uniform interface
to different local resource management systems
Condor-G able toprovide a
reliable/crash-proof job submission service
Broker chooses in whichGlobus resources the jobs
must be submitted
Farms
Other info
![Page 16: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/16.jpg)
Grid Accounting New problem
Working systems (even prototype implementations) don’t exist yet
Economy-based model for Grid accounting ?
See Stefano’s presentation
![Page 17: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/17.jpg)
Deliverables foreseen in the INFN-GRID proposal D2.1.1 Technical assessment about Globus
and Condor, interactions and usage (5/2001) Done
D2.1.2 First resource broker implementation for high throughput applications (7/2001) The resource broker should be easily
customizable for high throughput applications Usable after M9 release
![Page 18: Grid Workload Management Massimo Sgaravatto INFN Padova.](https://reader036.fdocuments.us/reader036/viewer/2022082818/56649ee45503460f94bf388e/html5/thumbnails/18.jpg)
Deliverables foreseen in the INFN-GRID proposal D2.1.3 Comparison of different local
resource managers (10/2001) Condor, LSF, PBS Farms with these resource management
systems already in place and instrumented with the Globus software
D2.1.4 Study of the three workload systems and implementation of the workload system for Monte Carlo productions (12/2001) Should be achievable