Post on 03-Jan-2016
Enabling Grids for E-sciencE
CREAM-BESLuigi Zangrando
INFN Sezione di Padova, luigi.zangrando@pd.infn.itSupercomputing'07
EU project: RIO31844-OMII-EUROPE
What is CREAM?
• Computing Resource Execution And Management (CREAM) is a webservice-based Execution Service– Written in Java– Executes as an Axis container in the Tomcat
application server– CREAM is being developed in EGEE as part of the
gLite middleware• CREAM “legacy” interface is not BES compliant
– Developed long before BES was available• The OMII-EU project is contributing the BES interface
to CREAM
EU project: RIO31844-OMII-EUROPE
CREAM-BES
• CREAM-BES is made of two separate components– The legacy CREAM job execution server, which is
being developed by the EGEE collaboration and is used in the gLite middleware
– The BES interface for CREAM, which is being developed by the OMII-Europe project
EU project: RIO31844-OMII-EUROPE
CREAM-BES and gLite
LegacyInterface
LegacyInterface
BES Interface
BES Engine
CREAMCore
CREAMCore
EU project: RIO31844-OMII-EUROPE
Legacy vs BES interfaces• Legacy interface
– JobRegister
– JobStart
– JobCancel
– JobList
– JobLease
– JobInfo
– JobPurge
– JobSignal
– JobSuspend / JobResume
– JobProxyRenew
– GetInfo
– GetCEMonURL
– EnableAcceptJobSubmissions
– DisableAcceptJobSubmissions
– DoesAcceptJobSubmissions
• BES Interface– CreateActivity– TerminateActivities– GetActivityStatuses– GetActivityDocuments– GetFactoryAttributesDo
cument– StopAcceptingNewActivi
ties– StartAcceptingNewActiv
ities
EU project: RIO31844-OMII-EUROPE
Main differences between legacy and BES interfaces
• CREAM security model uses the concept of delegation– Users can delegate their credentials to the CE for a
limited time, so that the CE can perform actions (e.g., staging files) on behalf of the user
– BES itself does not mandate any specific security implementation
• The legacy CREAM state model is a superset of the BES one
EU project: RIO31844-OMII-EUROPE
Standards addressed
• Basic Execution Service (BES) v1.0• Job Submission Description Language (JSDL) v1.0• HPC Basic Profile v1.0• Still work in progress
– Bug fixing ongoing– JSDL (especially Resources elements) not fully
supported– Only FTP supported for data staging with cleartext
username/password—for demo purposes only!!!
EU project: RIO31844-OMII-EUROPE
Deployment
• https://omiivm03.cnaf.infn.it:8443/ce-cream/services/CreamBes is the (root) BES container– It acts as a front-end to different batch queues, each
one with its own endpoint URI (see later)• ftp://lxgrid04.pd.infn.it:2121/ acts as an FTP server for
input/output file staging• We interact with the CREAM-BES service using some
custom Java comments– They implement the client-side of the BES interface
EU project: RIO31844-OMII-EUROPE
GetFactoryAttributesDocument
• Let us check the capabilities of the top-level BES container
• We get a textual rendering of some of the service attributes, as in the next slide
./cream-bes.sh attributes -r \ https://omiivm03.cnaf.infn.it:8443/ce-cream/services/CreamBes
EU project: RIO31844-OMII-EUROPE
OutputCommonName = CREAM-BES testLong Description = CREAM-BES CE for testsIs Accepting New Activities = trueLocal Resource Manager Type = urn:lrms.type.undefinedTotal number of contained resources = 2Total Number Of Activities = 4 activity epr = https://omiivm03.cnaf.infn.it:8443/ce-cream/services/CreamBes jobId = https://omiivm03.cnaf.infn.it:8443/CREAM882464212---------------------------- raw epr begin ----------------------------[...]----------------------------- raw epr end -----------------------------[...]Root basic resource attributes: No basic attributes foundContained basic resource attributes: Resource Name = https://omiivm03.cnaf.infn.it:8443/ce-cream/services/CreamBes?lrms=pbs&queue=cert CPUArchitecture = other Operating System Name = other Operating System Version = SLC CPUCount = 4.0 CPUSpeed (Hz) = 2000.0 Physical Memory = 1024.0 Virtual Memory = 2048.0Contained basic resource attributes: Resource Name = https://omiivm03.cnaf.infn.it:8443/ce-cream/services/CreamBes?lrms=pbs&queue=omii CPUCount = 4.0 CPUSpeed (Hz) = null Physical Memory = null Virtual Memory = null
EU project: RIO31844-OMII-EUROPE
Note
• The output shows the list of activities on the BES service
• It also shows a number of “contained resources”, each one being a specific Queue– For compatibility with the legacy CREAM, each queue is identified
by the pair (Batch_System_Name, Queue_Name)
• Each resource (queue) is identified by an URI, which points to the same BES service
https://omiivm03.cnaf.infn.it:8443/ce-cream/services/CreamBes?lrms=pbs&queue=cert
Batch System Queue Name
EU project: RIO31844-OMII-EUROPE
Multiple batch queues
BES container
(front-end)
Batch System = XQueue Name = Y
Batch System = WQueue Name = Z
https://www.foo.org:8443/<path>
https://www.foo.org:8443/<path>?lrms=W&queue=Z
https://www.foo.org:8443/<path>?lrms=X&queue=Y
EU project: RIO31844-OMII-EUROPE
Multiple batch queues
• If the user interacts with the BES front-end– Job submissions will be directed to one of the available
queues (at the moment always the default one)– A user can query the status of any job she owns,
regardless the queue where the job is running• If the user interacts with a specific queue
– Job submissions will be directed to that specific queue– The user can query the status of any job she owns,
provided that the job is executing in that queue
EU project: RIO31844-OMII-EUROPE
GetFactoryAttributesDocument(for a specific queue)
CommonName = CREAM-BES testLong Description = CREAM-BES CE for testsIs Accepting New Activities = trueLocal Resource Manager Type = urn:lrms.type.undefinedTotal number of contained resources = 0Total Number Of Activities = 4 activity epr = https://omiivm03.cnaf.infn.it:8443/ce-cream/services/CreamBes jobId = https://omiivm03.cnaf.infn.it:8443/CREAM882464212---------------------------- raw epr begin ----------------------------[...]----------------------------- raw epr end -----------------------------Root basic resource attributes: Resource Name = https://omiivm03.cnaf.infn.it:8443/ce-cream/services/CreamBes?lrms=pbs&queue=cert CPUArchitecture = other Operating System Name = other Operating System Version = SLC CPUCount = 4.0 CPUSpeed (Hz) = 2000.0 Physical Memory = 1024.0 Virtual Memory = 2048.0
./cream-bes.sh attributes -r \ https://omiivm03.cnaf.infn.it:8443/ce-\cream/services/CreamBes?lrms=pbs&queue=cert
EU project: RIO31844-OMII-EUROPE
Submitting a test job<?xml version="1.0" encoding="UTF-8"?><jsdl:JobDefinition xmlns="http://www.example.org/" xmlns:jsdl="http://schema.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdl-posix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:jsdl-hpcpa="http://schemas.ggf.org/jsdl/2006/07/jsdl-hpcpa" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <jsdl:JobDescription> <jsdl:JobIdentification> <jsdl:JobName>Simple Test Job</jsdl:JobName> <jsdl:Description>A simple job which just sleeps for 20 seconds</jsdl:Description> </jsdl:JobIdentification> <jsdl:Application> <jsdl:ApplicationName>sleep</jsdl:ApplicationName> <jsdl-hpcpa:HPCProfileApplication> <jsdl-hpcpa:Executable>/bin/sleep</jsdl-hpcpa:Executable> <jsdl-hpcpa:Argument>20</jsdl-hpcpa:Argument> </jsdl-hpcpa:HPCProfileApplication> </jsdl:Application>
<jsdl:Resources> <jsdl:TotalCPUCount> <jsdl:LowerBoundedRange>10.0</jsdl:LowerBoundedRange> </jsdl:TotalCPUCount> </jsdl:Resources>
</jsdl:JobDescription></jsdl:JobDefinition>
EU project: RIO31844-OMII-EUROPE
Submitting a test job./cream-bes.sh create -r \ https://omiivm03.cnaf.infn.it:8443/ce-\cream/services/CreamBes?lrms=pbs&queue=cert \ test/test_not_satisfied.jsdl
UnsupportedFeatureFault raised:Detail Message: nullFeature: jsdl:TotalCPUCountMessageElement: <?xml version="1.0" encoding="UTF-8"?><ns2:TotalCPUCount xmlns:ns2="http://schemas.ggf.org/jsdl/2005/11/jsdl"> <ns2:LowerBoundedRange>10.0</ns2:LowerBoundedRange></ns2:TotalCPUCount>
EU project: RIO31844-OMII-EUROPE
Submitting a test job
• In this case, submission fails• UnsupportedFeatureFault
– The BES endpoint does not support the requested resource
• In particular, the BES service was unable to satisfy the resource requirements– The body of the fault contains the offending XML
fragment
EU project: RIO31844-OMII-EUROPE
A different example
<jsdl:JobDefinition xmlns="http://www.example.org/" xmlns:jsdl="http://schema.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdl-posix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:jsdl-hpcpa="http://schemas.ggf.org/jsdl/2006/07/jsdl-hpcpa" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<jsdl:JobDescription>
<!-- JobIdentification --> <!-- Application --> <!-- Resources --> <!-- Data Staging -->
</jsdl:JobDescription>
</jsdl:JobDefinition>
EU project: RIO31844-OMII-EUROPE
A different example
<jsdl:JobDefinition xmlns="http://www.example.org/" xmlns:jsdl="http://schema.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdl-posix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:jsdl-hpcpa="http://schemas.ggf.org/jsdl/2006/07/jsdl-hpcpa" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<jsdl:JobDescription>
<!-- JobIdentification --> <!-- Application --> <!-- Resources --> <!-- Data Staging -->
</jsdl:JobDescription>
</jsdl:JobDefinition>
<jsdl:JobIdentification> <jsdl:JobName>MD5 Computation Job</jsdl:JobName> <jsdl:Description> This job computes the MD5 hash of a file. </jsdl:Description></jsdl:JobIdentification>
<jsdl:JobIdentification> <jsdl:JobName>MD5 Computation Job</jsdl:JobName> <jsdl:Description> This job computes the MD5 hash of a file. </jsdl:Description></jsdl:JobIdentification>
EU project: RIO31844-OMII-EUROPE
A different example
<jsdl:JobDefinition xmlns="http://www.example.org/" xmlns:jsdl="http://schema.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdl-posix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:jsdl-hpcpa="http://schemas.ggf.org/jsdl/2006/07/jsdl-hpcpa" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<jsdl:JobDescription>
<!-- JobIdentification --> <!-- Application --> <!-- Resources --> <!-- Data Staging -->
</jsdl:JobDescription>
</jsdl:JobDefinition>
<jsdl:Application> <jsdl:ApplicationName>md5sum</jsdl:ApplicationName> <jsdl-hpcpa:HPCProfileApplication> <jsdl-hpcpa:Executable>/usr/bin/md5sum</jsdl-hpcpa:Executable> <jsdl-hpcpa:Argument>fish.jpg</jsdl-hpcpa:Argument> <jsdl-hpcpa:Output>md5.txt</jsdl-hpcpa:Output> </jsdl-hpcpa:HPCProfileApplication></jsdl:Application>
<jsdl:Application> <jsdl:ApplicationName>md5sum</jsdl:ApplicationName> <jsdl-hpcpa:HPCProfileApplication> <jsdl-hpcpa:Executable>/usr/bin/md5sum</jsdl-hpcpa:Executable> <jsdl-hpcpa:Argument>fish.jpg</jsdl-hpcpa:Argument> <jsdl-hpcpa:Output>md5.txt</jsdl-hpcpa:Output> </jsdl-hpcpa:HPCProfileApplication></jsdl:Application>
EU project: RIO31844-OMII-EUROPE
A different example
<jsdl:JobDefinition xmlns="http://www.example.org/" xmlns:jsdl="http://schema.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdl-posix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:jsdl-hpcpa="http://schemas.ggf.org/jsdl/2006/07/jsdl-hpcpa" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<jsdl:JobDescription>
<!-- JobIdentification --> <!-- Application --> <!-- Resources --> <!-- Data Staging -->
</jsdl:JobDescription>
</jsdl:JobDefinition>
<jsdl:Resources> <jsdl:TotalCPUCount> <jsdl:LowerBoundedRange>1.0</jsdl:LowerBoundedRange> </jsdl:TotalCPUCount></jsdl:Resources>
<jsdl:Resources> <jsdl:TotalCPUCount> <jsdl:LowerBoundedRange>1.0</jsdl:LowerBoundedRange> </jsdl:TotalCPUCount></jsdl:Resources>
EU project: RIO31844-OMII-EUROPE
A different example
<jsdl:JobDefinition xmlns="http://www.example.org/" xmlns:jsdl="http://schema.ggf.org/jsdl/2005/11/jsdl" xmlns:jsdl-posix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix" xmlns:jsdl-hpcpa="http://schemas.ggf.org/jsdl/2006/07/jsdl-hpcpa" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<jsdl:JobDescription>
<!-- JobIdentification --> <!-- Application --> <!-- Resources --> <!-- Data Staging -->
</jsdl:JobDescription>
</jsdl:JobDefinition>
<jsdl:DataStaging> <jsdl:FileName>fish.jpg</jsdl:FileName> <jsdl:Source> <jsdl:URI>ftp://test:test@lxgrid04.pd.infn.it:2121/home/test/tests/fish.jpg</jsdl:URI> </jsdl:Source> <jsdl:CreationFlag>overwrite</jsdl:CreationFlag> <jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination></jsdl:DataStaging><jsdl:DataStaging> <jsdl:FileName>md5.txt</jsdl:FileName> <jsdl:Target> <jsdl:URI>ftp://test:test@lxgrid04.pd.infn.it:2121/home/test/tests/md5.txt</jsdl:URI> </jsdl:Target> <jsdl:CreationFlag>overwrite</jsdl:CreationFlag> <jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination></jsdl:DataStaging>
<jsdl:DataStaging> <jsdl:FileName>fish.jpg</jsdl:FileName> <jsdl:Source> <jsdl:URI>ftp://test:test@lxgrid04.pd.infn.it:2121/home/test/tests/fish.jpg</jsdl:URI> </jsdl:Source> <jsdl:CreationFlag>overwrite</jsdl:CreationFlag> <jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination></jsdl:DataStaging><jsdl:DataStaging> <jsdl:FileName>md5.txt</jsdl:FileName> <jsdl:Target> <jsdl:URI>ftp://test:test@lxgrid04.pd.infn.it:2121/home/test/tests/md5.txt</jsdl:URI> </jsdl:Target> <jsdl:CreationFlag>overwrite</jsdl:CreationFlag> <jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination></jsdl:DataStaging>
EU project: RIO31844-OMII-EUROPE
CreateActivity
./cream-bes.sh create -r \ https://omiivm03.cnaf.infn.it:8443/ce-\cream/services/CreamBes?lrms=pbs&queue=cream_1 \ test/test_convert.jsdl
---------------------------- raw epr begin ----------------------------[...]----------------------------- raw epr end -----------------------------
Activity Identifier address: https://omiivm03.cnaf.infn.it:8443/ce-cream/services/CreamBesActivity Identifier Reference Parameters: https://omiivm03.cnaf.infn.it:8443/CREAM183524481
EU project: RIO31844-OMII-EUROPE
GetActivityStatus
cream-bes.sh status \-r https://cream-ce-01.pd.infn.it:8443/ce-\cream/services/CreamBes?lrms=pbs&queue=cream_1 \https://omiivm03.cnaf.infn.it:8443/CREAM183524481
Retrieved 1 activitiesactivity epr = https://cream-ce-01.pd.infn.it:8443/ce-cream/services/CreamBes jobId = https://omiivm03.cnaf.infn.it:8443/CREAM183524481 activity status = Finished