The middleware
description
Transcript of The middleware
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.orgwww.glite.org
Based on material by Sergio AndreozziINFN-CNAF
OMII-Europe All-Hands MeetingBologna, 12-13 February 2007
The middleware
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 2
Disclaimer
• This presentation is based on materials provided and authorized by the EGEE project and is freely available to download and use according to the terms of the following license:
http://creativecommons.org/licenses/by-nc-sa/2.5/
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 3
OUTLINE
• The EGEE Project– Objective– Relationship to other projects
• The gLite middleware– Middleware decomposition
Foundation High-level services
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 4
Part IThe EGEE Project
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 5
The EGEE project• EGEE
– 1 April 2004 – 31 March 2006– 71 partners in 27 countries, federated in regional Grids
• EGEE-II– 1 April 2006 – 31 March 2008– 91 partners in 32 countries – 13 Federations
• EGEE-III– 1 April 2008 – 31 March 2010– More than 120 partners
• Objectives– Large-scale, production-quality
infrastructure for e-Science – Attracting new resources and
users from industry as well asscience
– Improving and maintaining “gLite” Grid middleware
US partners in EGEE-II:• Univ. Chicago• Univ. South. California• Univ. Wisconsin• RENCI
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 6
• Infrastructure operation– Currently includes sites across 39 countries– Continuous monitoring of grid services & automated site
configuration/management
• Middleware– Production quality middleware distributed under
business friendly open source licence
• User Support - Managed process from first contact through to production usage– Training– Expertise in grid-enabling applications– Online helpdesk– Networking events (User Forum, Conferences etc.)
• Interoperability– Expanding geographical reach and interoperability
with related infrastructures
Main lines of the EGEE project
TWGRID
KnowARC
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 7
Applications on EGEE• Applications from an increasing
number of domains– Astrophysics– Computational Chemistry– Earth Sciences– Financial Simulation– Fusion– Geophysics– High Energy Physics– Life Sciences– Multimedia– Material Sciences– …
Book of abstracts: http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-005.pdf
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 8
EU projects related to EGEE
EUGRIDGRID
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 9
Sustainability: Beyond EGEE-II
• Need to prepare for permanent Grid infrastructure– Ensure a reliable and adaptive support for all sciences– Independent of short project funding cycles– Infrastructure managed in collaboration
with national grid initiatives
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 10
Part IIThe gLite middleware
Programming the Grid with gLitehttp://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 11
Middleware structure• Applications have access both to
Higher-level Grid Services and to Foundation Grid Middleware
• Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory
• Foundation Grid Middleware will be deployed on the EGEE infrastructure– Must be complete and robust– Should allow interoperation with
other major grid infrastructures– Should not assume the use of
Higher-Level Grid Services
Foundation Grid Middleware Security model and infrastructureComputing (CE) and Storage Elements (SE)AccountingInformation and Monitoring
Higher-Level Grid Services Workload ManagementReplica ManagementVisualizationWorkflowGrid Economies...
Applications
Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 12
gLite Services Decomposition
6 High Level Services+ CLI & API
Legend:
AvailableForeseen in the architecture (only Job provenance will be available by the end of EGEE-II)
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLite components
• UI: User Interface• CE: Computing Element• SE: Storage Element• WN: Worker Node• WMS: Workload Management System• VOMS: Virtual Organization Membership Service• LB: Logging and Bookkeeping• MonBOX: monitoring• LFC: Logical File Catalog• BDII: Berkeley Database Information Index, stores all
infomation about the resources available in the grid infrastructure
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 14
Job Workflow in gLite
UIJDL
Logging &Logging &Book-keepingBook-keeping
ResourceResourceBrokerBroker
Job SubmissionJob SubmissionServiceService
StorageStorageElementElement
ComputingComputingElementElement
Information Information ServiceService
Job Status
LFCLFCCatalogCatalog
DataSets info
Author.&Authen.
Job Submit
Event
Job Q
uery Job S
tatu
s
Input “sandbox”
Input “sandbox” + Broker InfoGlobus RSL
Output “sandbox”
Output “sandbox”
Job Status
Publish
vom
s-pro
xy-in
it
Exp
ande
d JD
L
SE & CE info
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 15
Job Workflow in gLite
UIJDL
Logging &Logging &Book-keepingBook-keeping
ResourceResourceBrokerBroker
Job SubmissionJob SubmissionServiceService
StorageStorageElementElement
ComputingComputingElementElement
Information Information IndexIndex
Job Status
LFCLFCCatalogCatalog
DataSets info
Author.&Authen.
Job Submit
Event
Job Q
uery Job S
tatu
s
Input “sandbox”
Input “sandbox” + Broker InfoGlobus RSL
Output “sandbox”
Output “sandbox”
Job Status
Publish
vom
s-pro
xy-in
it
Exp
ande
d JD
L
SE & CE info
•WMProxy
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 OSG Consortium Meeting - Seattle - 21-23 August 2006 16
High Level Services: Workload Manag.• Resource brokering, workflow management, I/O data
managementWeb Service interface: WMProxy– Task Queue: keep non matched jobs– Information SuperMarket: optimized cache of information system– Match Maker: assigns jobs to resources according to user
requirements– Job submission & monitoring
Condor-GICE (to CREAM)
– External interactions: Information System Data Catalogs Logging&Bookkeeping Policy Management
system (G-PBox)•CREAM: Computing Resource Execution and ManagementICE: Interface to CREAM Environment
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 17
Grid Foundation: Security
• Authentication based on X.509 PKI infrastructure– Certificate Authorities (CA) issue (long lived) certificates
identifying individuals (much like a passport) Commonly used in web browsers to authenticate to sites
– Trust between CAs and sites is established (offline)– In order to reduce vulnerability, on the Grid user identification is
done by using (short lived) proxies of their certificates• Proxies can
– Be delegated to a service such that it can act on the user’s behalf
– Include additional attributes (like VO information via the VO Membership Service VOMS)
– Be stored in an external proxy store (MyProxy) – Be renewed (in case they are about to expire)
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 18
Grid Foundation: Security
• Local Centre Authorization Service (LCAS) handles authorization requests to the local computing fabric
• Local Credential Mapping Service (LCMAPS) provides all local credentials needed for jobs allowed into the fabric.
• Batch Local ASCII Helper – The protocol (BLAHP): provides a set of plain ASCII
commands used by Condor-C (and CREAM) to manage jobs on the batch systems.
– The daemon (BLAHPD): implements the helper daemon responsible for converting BLAHP commands into batch system actions, interpreting their results and reporting them in BLAHP format.
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 19
Grid foundation: Information Systems• Generic Information Provider (GIP)
– Provides LDIF information about a grid service in accordance to the GLUE Schema
• BDII: Information system in gLite 3.0 (by LCG)– LDAP database that is updated by an external process– More than one DBs is used separate read and write– A port forwarder is used internally to select the correct DB
GIP Provider
Config File
LDIF File
Plugin
Cache
•LDIF: Lightweight Directory Interchange FormatLDAP: Lightweight Data Access ProtocolGLUE: Grid Laboratory Uniform Environment•BDII: Berkeley Datbase Information Index
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 20
Grid foundation: Information Systems
• R-GMA: provides a uniform method to access and publish distributed information and monitoring data– Used for job and infrastructure monitoring in gLite 3.0– Working to add authorization
• Service Discovery:– Provides a standard set of methods for locating Grid services – Currently supports R-GMA, BDII and XML files as backends– Will add local cache of information– Used by some DM and WMS components in gLite 3.0
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 21
Grid foundation: Computing Element• Three flavours available now:
LCG-CE (GT2 GRAM) In production now but will be phased-out
next year gLite-CE (GSI-enabled Condor-C)
Already deployed but still needs thorough testing and tuning. Being done now
CREAM (WS-I based interface) Deployed on the JRA1 preview test-bed.
After a first testing phase will be certified and deployed together with the gLite-CE
Our contribution to the OGF-BES group for a standard WS-I based CE interface
CREAM and WMProxy demo at SC06!• BLAH is the interface to the local
resource manager (via plug-ins)– CREAM and gLite-CE– Information pass-through: pass
parameters to the LRMS to help job scheduling
WMS,Clients
LRMSWN
bdIIR-GMACEMon
ComputingElement
glexec +LCAS/
LCMAPSBLAH
Grid
Site
InformationSystem
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 22
Grid foundation: Accounting
• APEL: Uses R-GMA to propagate and display job accounting information for infrastructure monitoring– Reads LRMS log files provided by LCG-CE and BLAH– Preparing an update for gLite 3.0 to use the files form BLAH
• DGAS: Collects, stores and transfers accounting data. Compliant with privacy requirements– Reads LRMS log files provided by LCG-CE and BLAH.– Stores information in a site database (HLR) and optionally in a
central HLR. Access granted to user, site and VO administrators– Not yet certified in gLite 3.0. Deployment plan:
DGAS is in certification at INFN It will send records to the GOC via DGAS2APEL
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 23
Grid foundation: Storage Element• Storage Element
– Common interface: SRMv1, migrating to SRM v2.2– Various implementation from LCG and other external projects
disk-based: DPM, dCache / tape-based: Castor, dCache– Support for ACLs in DPM (in future in Castor and dCache)
After the summer: synchronization of ACLs between SEs– Common rfio library for Castor and DPM being added
• Posix-like file access:– Grid File Access Layer (GFAL) by LCG
Support for ACL in the SRM layer (currently in DPM only) Support for SRMv2 being added now
– gLite I/O Support for ACLs from the file catalog and interfaced to Hydra for data
encryption Not certified in gLite 3.0. To be dismissed when all functionalities will be
also available in GFAL.
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 24
High Level Services: Catalogues
• File Catalogs– LFC from LCG
In June: interface to POOL. In the summer: LFC replication and backup.
– Hydra: stores keys for data encryption Being interfaced to GFAL (done by July) Currently only one instance, but in future there will be 3 instances:
at least 2 need to be available for decryption. Not yet certified in gLite 3.0. Certification will start soon.
– AMGA Metadata Catalog: generic metadata catalogue Joint JRA1-NA4 (ARDA) development. Used mainly by Biomed Not yet certified in gLite 3.0. Certification will start soon.
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 25
High Level Services: File transfer
• FTS: Reliable, scalable and customizable file transfer– Manages transfers through channels
mono-directional network pipes between two sites
– Web service interface– Automatic discovery of services– Support for different user and administrative
roles– Adding support for pre-staging and new proxy renewal schema– Support for SRMv2.2, delegation, VOMS-aware proxy renewal in certification
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 26
High Level Services: Workload mgmt.
• WMS helps the user accessing computing resources – Resource brokering, management of job input/output, ...
• LCG-RB: GT2 + Condor-G– To be replaced when the gLite WMS proves to be reliable
• gLite WMS: Web service (WMProxy) + Condor-G– Management of complex workflows (DAGs) and compound jobs
bulk submission and shared input sandboxes support for input files on different servers (scattered sandboxes)
– Support for shallow resubmission of jobs– Job File Perusal: file peeking during job execution– Supports collection of information from CEMon, BDII, R-GMA and
from DLI and StorageIndex data management interfaces– Support for parallel jobs (MPI) when the home dir is not shared– Deployed for the first time in gLite 3.0
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 27
WMS/LB/UI and CE• New WMS deployed and thoroughly debugged
– CMS: 100 collections * 200 jobs/collection, 3 UIs, 33 CEs ~ 2.5 h to submit jobs
• 0.5 seconds/job ~ 17 hours to transfer jobs to a CE
• 3 seconds/job• 26K jobs/day
Negligible failure rate due to WMS– Shallow resubmission
failure rate drops to less than 1% with 3 resubmissions
• Stability problems– investigating also other deployment
scenarios to make it more robust
• gLite CE still to be tested and optimized
Done(Success) jobs after ith Submission
0
20
40
60
80
100
0 1 2 3 4 5 6
Number of Submission
(%)
ATLAS
CMS
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 28
High Level Services: Workflows• Direct Acyclic Graph (DAG) is a set
of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs
• A Collection is a group of jobs with no dependencies– basically a collection of JDL’s
• A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters
• Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs – Submission time reduction
Single call to WMProxy server Single Authentication and Authorization process Sharing of files between jobs
– Availability of both a single Job ID to manage the group as a whole and an ID for each single job in the group
nodeEnodeC
nodeA
nodeD
nodeB
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 29
High Level Services: Job Information• Logging and Bookkeeping service
– Tracks jobs during their lifetime (in terms of events)– LBProxy for fast access– L&B API and CLI to query jobs– Support for “CE reputability ranking“: maintains recent statistics
of job failures at CE’s and feeds back to WMS to aid planning• Job Provenance:
stores long term job information– Supports job rerun– If deployed will also
help unloading the L&B
– Not yet certified in gLite 3.0.
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 30
Highlights: Job Priorities • Applications ask for the possibility to diversify the access to
fast/slow queues depending on the user role/group inside the VO
• GPBOX is a tool that provides the possibility to define, store and propagate fine-grained VO policies– based on VOMS groups and roles– enforcement of policies at sites: sites may accept/reject policies– Not yet certified. Certification will start when requested by the TCG.
• Current activities: test job prioritization without GPBOX: - Map VOMS groups to batch system shares - Publish info on the share in the CE GLUE 1.2 schema (VOView) - WMS match-making depending on submitter VOMS certificate - Settings are not dynamic (via e-mail or CE updates) - GIP available for Torque/Maui only. Working on the LSF one - mainly a deployment issue
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 31
Summary
• gLite 3 is
– the next generation middleware for grid computing – developed according to a well defined process
controlled by the EGEE Technical Coordination Group– deployed on the EGEE production infrastructure
More than 200 sites– development is continuing to provide increased robustness,
usability, and functionality On the preview testbed
• CREAM, Job Provenance, glexec on the WNs, GPBOX
– gLite sources: http://glite.cvs.cern.ch/cgi-bin/glite.cgi/
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 32
www.glite.org