Elisabetta Ronchieri - EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 1...
-
Upload
alice-flynn -
Category
Documents
-
view
215 -
download
0
Transcript of Elisabetta Ronchieri - EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 1...
Elisabetta Ronchieri - EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 1
PartnerLogo
EDG – WP1 (Work Load Management System) Activities
Plans
elisabetta.ronchieri @cnaf.infn.it
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 2
A new era
Architecture has been revised Increase reliability and flexibility of the system
Simplify the whole system (e.g. minimize duplication of persistent information)
Make easy to plug-in new components that implementing new functionalities
Address some of the shortcomings that emerged in the first DataGrid testbed
Favor interoperability with other Grid frameworks, by allowing exploiting WP1 modules also outside the WP1 WMS
New Functionalities are supported
A coordination between EDG WP1 and PPDG has been established to define a common guidelines
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 3
New Functionalities
Interactive jobs
Job Checkpointing
Job Partitioning
Job Dependencies
Integration with WP2 Query Optimization Service
C++ and Java API, and GUI
Deployment of Accounting infrastructure over Testbed (HLRs with command line interface)
Advance reservation API
Co-allocation API
RB relying on the GLUE schema
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 4
New features: Interactive Jobs
Interactive Job represents a job with continuos feedback, so a job for what a user needs to have standard streams (stdin, stdout, and stderr) on the UI (submitting) machine.
The connection between WN and UI is always open from the job (we assume OutBoundIP connectivity available from WNs).
We do NOT support: remote signal sending
asynchronous interaction with the job
Possible extensions will be evaluated after first deployment phase.
We use an existing tools Condor Bypass (Grid Console)
http://www.cs.wisc.edu/condor/bypass
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 5
Bypass : What is it ? 1/7
Bypass is a tool for writing interposition agents and split execution systems.
Most applications communicate with the operating system via a standard library which converts their procedure calls into appropiate kernel operations.
An interposition agent is a piece of software which transforms a program’s operation interposing iteself between the program and the operating system.
An interposition agent squeezes itself into existing program and modify its behavior
SO, the agent grabs control and manipulates the results, when the program attemps certain system calls.
An agent can be used to instrument programs, to attach it to new systems, and to emulate operations that otherwise might not be available.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 6
Bypass : What is it? 2/7
Bypass allows you to Split and dinamically-link application Transparently use heterogeneous systems Trap calls with minimal overhead Control execution paths with plain C Combine small agents
Bypass language Declare what procedures to trap in C++ Annotate pointer types with data flow (direction and binary data) Give two function bodies: agent_action and shadow_action
SO, e.g. the programmer provides a specification which lists what system calls are to be trapped and the code to replace. Bypass parses the specification and produces C++ code for an agent.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 7
Bypass : Grid Console (GC) 3/7
The Grid Console is a system for getting mostrly-continuous input/output rom remote programs running on an unrealiable network
The GC is robust to many types of failures that can takle place in such a context (e.g. crashed machines, partitioned networks, full disks)
Its first priority is to keep jobs running
Its second priority is to keep the output moving when conditions permit
The GC is implemented using Bypass
GC consists of two software components: an agent and a shadow
The agent intercepts reds and writes on stdin, stdout and stderr. All other operations are untouched. Reads and writers on these streams are forwarded to the shadow for execution.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 8
Bypass : Example 4/7
File: simple.bypass
ssize_t write(int fd,in "length" const void *data,size_t length
)
agent_action{{if (fd < 3) { return bypass_shadow_write(fd, data, length);} else { return write(fd, data, length);}
}}
shadow_action{{return write(fd,data,length);
}};
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 9
Bypass : agent_action and shadow_action 5/7
An agent action:
Is any arbitrary C++ code
When a program invokes write(), the agent_action is exevuted at the home machine
Within the agent_action: write() – invoke the original write() at the foreign machine bypass_shadow_write() – invoke the shadow action via RPC
A shadow action:
Is any arbitrary C++ code
If the agent decides to invoke the RPC to the shadow, the shadow_action is executed at the home machine
Within the shadow_action: Write() – invoke write() at the home machine
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 10
Bypass : How use it! 6/7
Run “bypass” to read the specification and produce C++ source code:
%bypass –agent –shadow simple.bypass
The shadow is compiled into a plain executable
The agent is compiled into a shared library
The dynamic linker is used to force the agent into an executable at run-time:
seteenv LD_PRELOAD simple_agent.so export LD_PRELOAD=simple_agent.so
Procedure calls are trapped merely by putting the agent first in the link list
This method can be used on any dynamically-linked program: tcsh, emacs, ….
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 11
Can Bypass be used by a real user ? 7/7
Bypass works on unmodified executables. Real users are not willing/able to rewrite/recompile their programs
Bypass requires no special privileges Real users do not have the root pwd
SO, Bypass allows a Real User to make good use of a remote machine without begging the administrator to configure it to his/her needs.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 12
How to use Bypass GC in WP1 1/2
A Job Shadow is the Grid Console Shadow running on the UI machine.
A Pillow process is a process started on the WN just beore the job that intercepts the job standard streams.
The Pillow process is linked against a Job Agent which is a slightly modified Grid Console Interposition Agent.
Submission machine
User Interface
Job Shadow
CE
WN1
Job
WNn
stdin, stdout,
stderr
Gatekeeper
LRMS
Consol Agent
Pellow Process
LBshadow port,
shadow host
shadow port,
shadow host
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 13
How to use Bypass GC in WP1 2/2
Job submission goes through usual command (dg-job-submit)
The attribute “JobType” is set to “Interactive”.
Other attributes are: ShadowPort (is not mandatory)
ShadowHost (always filled by UI)
UI starts the Job Shadow process on the submitting machine, at the specified port
UI writes in LB, the ShadowPort and ShadowHost values
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 14
In case of crash at the UI side
dg-job-attach <jobID> If the job is still running, reads ShadowPort from LB
Re-starts the shadow on that port
If the port is not available starts the shadow on a different port and sores in LB
On the WN the agent retries to contact the shadow After a number of failures queries the LB for the ShadowPort
If it has changed tries to contact the shadow at the new port
If it fails again, it gives up and the job is aborted
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 15
New Features: Job checkpointing
Checkpointing a job during its execution means saving its state, so that the job execution can be suspended, and resumed later, starting from the same point where it was previously stopped.
The idea is providing users with a “trivial” checkpointing service: through a proper API, a user can save, at any moment during the execution of a job, the state of this job. The hypothesis is, of course, that the job can be restarted from an “intermediate” state.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 16
New features: Job Partitioning
Job Partitioning takes place when a job has to process a large set of “independent elements”.
In these cases it may be worthwhile to decompose the job into smaller sub-jobs (which can be executed in parallel), in order to reduce the overall time needed to process all these elements, and to optimize the usage of all available Grid resources.
At the end each sub-job must save a final state, then retrieved by a job aggregator, responsible to collect the results of the sub-jobs and produce the overall output.
This problem has been addressed in the context of job checkpointing and makes large use of the DAGMan mechanism.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 17
New features: Job Dependencies
Job dependencies takes place when the execution of a program Y cannot start before the program X has successfully finished.
X
Y
We consider just temporal dependencies (e.g. run job Y only when job X has finished).(1)
We are investigating whether there are other kind of dependencies.
It is based on Condor DAGMan
http://www.cs.wisc.edu/condor/dagman
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 18
DAGMan : Meta-Scheduler
DAGMan means Directed Acyclic Graph Manager
DAGMan is an existing solution to handle inter-job dependencies. It handles a set of jobs that must be run in a certain order.
(e.g., “Don’t run job “Y” until job “X” has completed successfully”, so there is a time order to preserve)
DAGMan navigates the graph, determines which graph nodes are free of dependencies, and follows the execution of the corresponding jobs.
DAGMan is a product developed within the Condor project
A DAGMan process is started by CondorG for each DAG submitted to it.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 19
DAGMan : What’s a DAG? 1/2
A DAG is the data structure used by DAGMan to represent these dependencies.
Each job (program) is a “node” in the DAG.
Each node can have any number of “parent” or “children” nodes – as long as there are no loops!
Dependencies are represented by contiguos segments called “arcs”
The arcs are directed since there is a clear time order on which jobs should be run.
Each node consists of three parts:
A PRE-script, which is executed before the user’s job is run
A user’s job
A POST-script, which is executed after the user’s job has run
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 20
DAGMan : What’s a DAG? 2/2
The jobs (nodes) are independent: each one has its own executable, input, output, running environment, requirements, and so on.
A DAG node fails, if any of these three parts fail
A whole DAG succeeds, if and only if all its member jobs succeed
Job X
Job Y Job W
Job Z
Job Z is executed only after both Job Y and W are completed.
At their turn, Y and W have both to wait for X to be completed before being started.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 21
How a user can define a DAG 1/2
A DAG is specified via JDL.
A DAG consists of a ClassAd, where the attribute “JobType” is set to “DAG”, containing a set of ClassAd attributes, each one representing a job.
Arcs = <array of couple of strings> (each couple of string is an arc)
PreScript = <string> (the script to run before job execution)
PreScriptArguments = <array of strings> (the list of Arguments for the PRE-script)
PostScript = <string> (the script to run after the job has completed)
PostScriptArguements = <array of strings> (the arguments for the POST-script)
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 22
Example of DAG 2/2
[
JobType = “DAG”;
JA = [Executable = “JA.sh”;PreScript = “PreJA.sh”;PreScriptArguments = {“1”};
];
JB = [Executable = “JB.sh”;PostScript = “PostJB.sh”;PostScriptArguments = {“$RETURN”};
];
JC = [Executable = “JC.sh”;
];
JD = [Executable = “JD.sh”;PreScript = “PreJD.sh”;PostScript = “PostJD.sh”;PostScriptArguments = {“1”, “a”}
]
Arcs = {{JA, JB}, {JA, JC}, {JB, JD}, {JC, JD}};
…….
]
Job A
Job B Job C
Job D
The $RETURN macro represents the exit status of B.sh.
In general, an exit status other than zero implies that the node, and hence the whole DAG, has failed.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 23
What operations a user can do on DAGs
dg-job-submit
Submits a DAG.
dg-job-cancel
Kills a previously submitted DAG.
All the jobs part of the DAG get killed.
A rescue DAG is produced.
dg-job-status
Returns the current status of the DAG.
dg-job-get-output
Retrieves the output sandbox for all the DAG member jobs, assuming that the DAG has completed.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 24
New features: Integration with WP2 Query Optimization Service
Help RB to find the best CE based on data location. RB will use access cost estimation APIs provided by WP2
Trigger of input data transfer Up to now all input data have to be copied where they are expected
to be by users, there is no automatic frequently-accessed file local fetching
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 25
New features: C++ and Java API, and GUI
C++/Java API provides a series of actions over a job or a collection of jobs such as performing a submission or looking for a matching resource, get the status and the logging info, retrieve the output files and cancel a running job. Moreover the package allows to manage proxy certificates, and to create JDL files.
GUI allows the user to:
Monitor the status of one or more jobs during his/their life cycle
Create-manage graphically step by step a syntax-error-safe JDL file
GUI exploits the Java API package. (There is also one in python)
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 26
New features: Deployment of Accounting infrastructure over Testbed
Based upon a computational economy model, users pay in order to execute their jobs on the resources, and the owner of the resources earn credits by executing the user jobs.
The are two reasons for:
To have a nearly stable equilibrium able to satisfy the needs of both resource providers and consumers
To credit of job resources usage to the resource owner(s) after execution
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 27
New features: Advance reservation API
Advance reservation of resources allows to realize end-to-end quality of service (QoS), and to reduce competition for resources.
The approach is based on concepts discussed in the Global Grid Forum.
A reservation is a promise from the system that an application will receive a certain level of service from a resource (e.g, a reservation may promise a given percentage of a CPU).
Advance reservation API is composed by: The Reservation Agent API ,which accepts a generic reservation from a
user, maps it into a reservation on a specific resource, matches the requirements and preferences specified by the user, performs the allocation on the specific resource, and allows the user to use a granted reservation for his job.
The Resource-Dependent Reservation Agent API where a reservation for the specified request of user is created, binds a reservation to run-time parameters, unbinds a reservation, cancels a reservation, modifies the parameters associated with a reservation, and returns the status of the resource reservation.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 28
How can a user request a resource reservation ? 1/2
A resource reservation request is specified via JDL.
The attribute “Type” is set to “Reservation”.
The other attributes are:
ReservationResource (type of underlying resource)
ReservationType (used in case a resource supports different types of reservation)
ReservationStart (specify the time when the reservation may begin)
ReservationEnd (specify the time when the reservation can expire)
ReservationDuration (specify how long the reservation lasts)
ReservationParameters (specify resource-depend parameters)
Not all the attributes are mandatory: “ReservationStart” and “ReservationEnd” default values are respectively “now” and “end time”.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 29
Example of resource reservation request 2/2
Reservation request for three nodes for 300 seconds on a CE running Linux, whose architecture is i386:
[Type = “Reservation”;ReservationResource = “computing”;ReservationStart = 1021539656;ReservationEnd = 1021541000;ReservationDuration = 300;ReservationParameters = [“nodes = 3”];……..Requirements = other.Arch == “i386 && other.OpSys == “Linux” &&
other.SupportReservation;
]
The time is an integer value expressing the number of seconds since the epoch, which corresponds to the midnight of the 1st of January 1970 UTC.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 30
New features: Co-allocation API
Co-allocation allows the concurrent allocation of multiple resources.
These resources can be homogeneous or heterogeneous.
The Co-allocation API is composed by; Co-allocation Agent API which accepts a co-allocation request from
a user, discovers resources compatible with the requirements and preferences included in all the resource descriptions, finds compatible combinations of resources that would satisfy the co-allocation request, and tries each combination
The Application Programming Interface API which creates a co-allocation, cancels a co-allocation, canceling all the reservations belonging to the specified co-allocation, modifies the allocation, returns the status of co-allocation.
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 31
How can a user request a co-allocation ? 1/2
A resource reservation request is specified via JDL.
The attribute “Type” is set to “coallocation”.
The other attributes are:
ReservationResource (type of underlying resource)
ReservationType (used in case a resource supports different types of reservation)
ReservationStart (specify the time when the reservation may begin)
ReservationEnd (specify the time when the reservation can expire)
ReservationDuration (specify how long the reservation lasts)
ReservationParameters (specify resource-depend parameters)
Not all the attributes are mandatory: “ReservationStart” and “ReservationEnd” default values are respectively “now” and “end time” ( infinite).
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 32
Example of co-allocation request 2/2
Co-allocation request for a computing node, 100 GB of storage in a SE “speaking” a certain protocol (gridFTP), and a connection between the considered CE and SE fo 10 MB/s.
[
Type = “coallocation”;
ReservationStart = 102224828;
ReservationEnd = 1022255428;
ReservationDuration = 3600;
Res1 = [
Type = “Reservation”;
ReservationResource = “computing”;
ReservationParameters = [nodes = 3; ];
Requirements = other.Arch == “i386 && other.OpSys == “Linux” && other.SupportReservation;
InputData = “LF:testbed0-00019”;
ReplicaCatalog = “ldap://sunlab2g.cnaf.inn.it:2010/rc=INFN Test RC, dc=sunlab2g, dc=cnaf, dc=infn, dc =it”;
]
Res2 = [
Type = “Reservation”;
ReservationResource = “storage”;
ReservationParameters = [space = 100000; ];
Requirements = other.Protocol == “gridftp” && other.FreeSoace > ReservationPrameters.space && other.SupportReservation;
]
Res3 = [
Type = “Reservation”;
ReservationResource = “network”;
ReservationParameters = [Bandwidth = 10000; EbdPoints = {Res1.CeId, Res2.SEId}];
Requirements = other.SupportReservation;
]
]
Elisabetta Ronchieri – EDG – WP1 (Workload Management System) Activities - 8/12/2002 - n° 33
New features: RB relying on the GLUE schema
Use the new CE schema for interoperability between EU Grid Project and US HEP Grid
Projects