Enabling Grids for E-sciencE
www.eu-egee.org
Workload Management System on gLite middlewareMatthieu Reichstadt CNRS/IN2P3
ACGRID School,
Hanoi (Vietnam) November 5th, 2007
Credits: Valeria Ardizzone and other EGEE colleagues…
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Outline
Overview of WMS Architecture Task Queue, Information Supermarket, MatchMaker, Scheduling
Policies, Job Submission Service, Job Logging & Bookkeeping.
Job Description Language Overview Basic attributes Advanced attributes
Practice Command line Exercises
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Workload Management System (WMS)
• Is the gLite3 component that allows users to submit jobs.
• Performs all tasks required to execute jobs.
• Comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources.
• Hides to the user the complexity of the Grid.
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
WMS’s Architecture
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
WMS’s Architecture
Job managementJob managementrequests (submission, requests (submission, cancellation) expressedcancellation) expressed
via a Job Descriptionvia a Job DescriptionLanguage (JDL)Language (JDL)
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
WMS’s Architecture
Finds an appropriateFinds an appropriateCE for each submission CE for each submission
request, taking into account request, taking into account job requests and preferences, job requests and preferences, Grid status, utilization policies Grid status, utilization policies
on resources on resources
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
WMS’s Architecture
Repository of resourceRepository of resource informationinformation
available to matchmakeravailable to matchmaker
Updated via notifications Updated via notifications and/or active and/or active
polling on resourcespolling on resources
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
WMS’s Architecture
Keeps submission Keeps submission requestsrequests
Requests are keptRequests are kept for a whilefor a while
if no resources are if no resources are immediately availableimmediately available
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
WMS’s Architecture
Performs the actual Performs the actual job submission job submission and monitoring and monitoring
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
WMS Components (1)
• Network Server NS - WMProxy
Accepts incoming requests from the UI (job submission, job removal)
If valid, passes them to the Workload Manager
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
WMS Components (2)
• Workload Manager WM
Core component of the WMS
Takes appropriate actions to satisfy requests – Resource Broker (MatchMaker) RB
Finds the resources that best match the request– Information SuperMarket ISM
Repository of resource information available in readonly mode to the RB
– Task Queue Give the possibility to keep the request if no
resources are immediately available Not matching request will be retried periodically
(eager scheduling) Or wait for notification of avalaible resources (lazy
scheduling)
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencEWMS Components (3)
eager scheduling (“push” model)eager scheduling (“push” model)
a job is bound to a resource as soon as possible. Once the
decision has been taken, the job is passed to the selected
resource for execution.
lazy scheduling (“pull” model)lazy scheduling (“pull” model)
the job is held by the WM until a resource becomes
available. When this happens the resource is matched
against the submitted job.
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
WMS Components (4)
WMS components handling the job during its lifetime and performs the submission
• Job Adapter (JA)
– is responsible for making the final touches to the JDL expression for a job, before it
is passed to CondorC for the actual submission creating the job wrapper script that creates the appropriate
execution environment in the CE worker node
• transfer of the input and of the output sandboxes
• CondorC
– responsible for performing the actual job management operations
• job submission, job removal
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
WMS Components (5)
• Log Monitor (LM)– is responsible for
watching the CondorC log file intercepting interesting events concerning active jobs
• Proxy Renewal Service– is responsible to assure that,
for all the lifetime of a job, a valid user proxy exists within the WMS
MyProxy Server is contacted in order to renew the user's credential
• Logging & Bookkeeping (LB)– is responsible to
Store events generated by the variuos components of the WMS
Querying the LB user can retrieve information about the job status
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Jobs State Machine (1/9)
Submitted job is entered by the user to the User Interface but not yet transferred to Network Server for processing
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Jobs State Machine (2/9)
Waiting job accepted by NS and waiting for Workload Manager processing or being processed by WMHelper modules.
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Jobs State Machine (3/9)
Ready job processed by WM but not yet transferred to the CE (local batch system queue).
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Jobs State Machine (4/9)
Scheduled job waiting in the queue on the CE.
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Jobs State Machine (5/9)
Running job is running on Worker Node.
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Jobs State Machine (6/9)
Done job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way).
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Jobs State Machine (7/9)
Aborted job processing was aborted by WMS (waiting in the WM queue or CE for too long, expiration of user credentials).
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Jobs State Machine (8/9)
Cancelled job has been successfully canceled on user request.
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Jobs State Machine (9/9)
Cleared output sandbox was transferred to the user or removed due to the timeout.
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Job Description Language
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
The JDL language
• The Job Description Language (JDL)Job Description Language (JDL) describes jobs for execution on Grid.
• The JDL adopted within the gLite middleware is based upon Condor’s CLASSified Advertisement language (ClassAd)CLASSified Advertisement language (ClassAd).
• A ClassAd is a record-like structure composed of a finite number of attributes separated by semi-colon (;)
• A ClassAd is highly flexible and can be used to represent arbitrary services
• The JDL file is processed by the “Match-making process” to select the best resource that satisfy the job’s requirements
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
The JDL file linesJDL file lines have the format :
Attribute = expressionAttribute = expression;;
2 categories of attributes:
1. Job Attributes define the job itself
2. Resources indicate the job constraints in terms of:
• Computing Resource
• Data and Storage resources
The JDL language
Comments are indicated by # or //The JDL is sensitive to blank characters and tabs.No blank characters or tabs should follow the semicolon at the end of a line.
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
• In a JDL, some attributes are mandatory while others
are optional.
• An “essential” JDL is the following:
If needed, arguments to the executable can be passed:
Arguments = “arguments list”;
[ Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; ]
JDL : basic attributes
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”};
Executable = < string > (mandatory)
• represents the execetable/command name
• you can specify an executable that:
• already exixts on the remote WN
• will be copied from the UI to the WN
• the arguments are reported in a specific attribute
JDL : basic attributes
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Arguments = < string > (optional)
• arguments for executable file:
• “-out outputfile.dat”
with: Executable = “execprog”;
on the Worker Node (WN) we will have:
$ execprog -out outputfile.dat
• the characters “” should be preceded by \
“ -a \”quoted string\” -bcd” becomes:
$ execprog -a ”quoted string” –bcd
Special characters (&, |, >, <) should be preceded by triple \ : Arguments = "-f file1\\\&file2";
JDL : basic attributes
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
StdOutput, StdError, StdInput = < string > (optional)
• paths of the output / error / input files
• StdOutput and StdError:
• must be also in Output Sandbox
• could have the same value
Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”};
JDL : basic attributes
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
InputSandbox = < string | string list > (optional)
• contains the input files to be copied from the UI on the WN before the
job execution
• only local UI files (for LFNs use the InputData attribute)
• the files can’t be over 10 MB each
• different files with different names (the destination dir is the same)
Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”};
OutputSandbox = {“std.out”,”std.err”};
JDL : basic attributes
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
OutputSandbox = < string | string list >
• contains the output files to be transferred from the WN on the UI after
the job execution
• different files with different names (the destination dir is the same)
Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”};
OutputSandbox = {“std.out”,”std.err”};
JDL : basic attributes
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
RequirementsRequirements (mandatory)• Job requirements on Grid resources (CE,SE,…) • Evaluation performed by the Match Maker• Specified using attributes published by the Information
Service• If not specified, the default value is: Requirements = other.GlueCEStateStatus ==
"Production“;
Examples: Requirements = other.GlueCEUniqueID ==
“clrlcgce01.in2p3.fr:2119/jobmanager-lcgpbs-auvergrid” Requirements = Member(“AUVERGRID-3.07.01”,
other.GlueHostApplicationSoftwareRunTimeEnvironment);
Requirements = other.GlueCEInfoTotalCPUs > 2 && other.GlueCEPolicyMaxRunningJobs < 2;
JDL : advanced attributes
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
JDL : advanced attributes
RankRank (mandatory)
• Floating-Point expression used to rank CEs that have already met the Requirements expression.
• can contain attributes that describe the CE in the Information System (IS).
• evaluation performed by the Resource Broker (RB) during the match-making phase.
• A higher numeric value equals a better rank.
• If not specified, the default value is:
Rank = -other.GlueCEStateEstimatedResponseTime;
E.g.: Rank = Rank = other.GlueCEStateFreeCPUs;other.GlueCEStateFreeCPUs;
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Environment = < string | string list > (optional)
• environment variables
• strings format: < variable name > = < string >
• example:
• Environment = { “JOB_LOG_FILE=/tmp/job.log”,
“INP_DIR=/tmp/input_files” };
JDL : advanced attributes
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
References
• EGEE User Guidehttps://edms.cern.ch/file/722398//gLite-3-UserGuide.pdf
• JDL Attributeshttps://edms.cern.ch/file/555796/1/EGEE-JRA1-TEC-555796-JDL-Attributes-v0-8.pdf
ACGrid School 5-9/11 2007
Enabling Grids for E-sciencE
Thank you
Top Related