GLite, the next generation middleware for Grid computing Oxana Smirnova (Lund/CERN) Nordic Grid...

16
gLite, the next generation middleware for Grid computing Oxana Smirnova (Lund/CERN) Nordic Grid Neighborhood Meeting Linköping, October 20, 2004 Uses material from E.Laure and F.Hemmer
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of GLite, the next generation middleware for Grid computing Oxana Smirnova (Lund/CERN) Nordic Grid...

gLite, the next generation middleware for Grid computing

Oxana Smirnova (Lund/CERN)

Nordic Grid Neighborhood Meeting

Linköping, October 20, 2004

Uses material from E.Laure and F.Hemmer

2

gLite

• What is gLite: “the next generation middleware for grid

computing” “collaborative efforts of more than 80

people in 10 different academic and industrial research centers”

“Part of the EGEE project (http://www.eu-egee.org)”

“bleeding-edge, best-of-breed framework for building grid applications tapping into the power of distributed computing and storage resources across the Internet”

EGEE Activity Areas

(quoted from http://www.glite.org)

Nordic contributors: HIP, PDC, UiB

3

Architecture guiding principles

• Lightweight services Easily and quickly deployable Use existing services where possible as basis for

re-engineering “Lightweight” does not mean less services or

non- intrusiveness – it means modularity• Interoperability

Allow for multiple implementations• Performance/Scalability & Resilience/Fault

Tolerance Large-scale deployment and continuous usage

• Portability Being built on Scientific Linux and Windows

• Co-existence with deployed infrastructure Reduce requirements on participating sites Flexible service deployment Multiple services running on the same physical

machine (if possible) Co-existence with LCG-2 and OSG (US) are

essential for the EGEE Grid service• Service oriented approach

60+ external dependencies

4

Service-oriented approach

• By adopting the Open Grid Services Architecture, with components that are: Loosely coupled (messages) Accessible across network; modular and self-contained; clean

modes of failure Can change implementation without changing interfaces Can be developed in anticipation of new use cases

• Follow WSRF standardization No mature WSRF implementations exist to-date so start with plain

WS• WSRF compliance is not an immediate goal, but the WSRF evolution

is followed• WS-I compliance is important

5

Globus 2 based Web services based

gLite-2gLite-1LCG-2LCG-1

gLite vs LCG-2

• Intended to replace LCG-2

• Starts with existing components

• Aims to address LCG-2 shortcoming and advanced needs from applications (in particular feedback from DCs)

• Prototyping short development cycles for fast user feedback

• Initial web-services based prototypes being tested with representatives from the application groups

6

Approach

• Exploit experience and components from existing projects AliEn, VDT, EDG, LCG, and others

• Design team works out architecture and design Architecture: https://edms.cern.ch/document/476451 Design: https://edms.cern.ch/document/487871/ Feedback and guidance from EGEE PTF, EGEE NA4,

LCG GAG, LCG Operations, LCG ARDA

• Components are initially deployed on a prototype infrastructure Small scale (CERN & Univ. Wisconsin) Get user feedback on service semantics and interfaces

• After internal integration and testing components to be deployed on the pre-production service

EDGVDT . . .

LCG . . .AliEn

7

Subsystems/components

LCG2: components gLite: servicesUser Interface

AliEn

Computing Element

Worker Node

Workload Management System

Package Management

Job Provenance

Logging and Bookkeeping

Data Management

Information & Monitoring

Job Monitoring

Accounting

Site Proxy

Security

Fabric management

8

Workload Management System

9

Computing Element

• Works in push or pull mode

• Site policy enforcement

• Exploit new Globus GK and Condor-C (close interaction with Globus and Condor team)

CEA … Computing Element Acceptance

JC … Job Controller

MON … Monitoring

LRMS … Local Resource Management System

10

Data Management

• Scheduled data transfers (like jobs)

• Reliable file transfer

• Site self-consistency

• SRM based storage

12

Catalogs

File Catalog

Metadata Catalog

LFN

Metadata

• File Catalog Filesystem-like view on logical file names Keeps track of sites where data is stored Conflict resolution

• Replica Catalog Keeps information at a site

• (Metadata Catalog) Attributes of files on the logical level Boundary between generic

middleware and application layer

Replica Catalog Site A

GUID SURL

SURL

LFN

Replica Catalog Site B

GUID SURL

SURL

LFN

GUIDSite ID

Site ID

13

Information and Monitoring

• R-GMA for Information system and

system monitoring Application Monitoring

• No major changes in architecture But re-engineer and harden

the system

• Co-existence and interoperability with other systems is a goal E.g. MonaLisa

MP

P – M

emory P

rimary P

roducer

DbS

P – D

atab

ase S

econdary P

roducer

Job wrapper

MPP

DbSP

Job wrapper

MPP

Job wrapper

MPP

e.g: D0 application monitoring:

14

“The Grid”“The Grid”

Joe

PseudonymityService

(optional)

CredentialStorage

1.2.

3.

4.

Obtain Grid (X.509)credentials for Joe

“Joe → Zyx”

“Issue Joe’sprivileges to Zyx”

“User=Zyx Issuer=Pseudo CA”

AttributeAuthority

myProxy

tbd

VOMS

GSILCAS/

LCMAPS

Security

15

GAS & Package Manager

• Grid Access Service (GAS) Discovers and manages services on behalf of the user File and metadata catalogs already integrated

• Package Manager Provides application software at execution site Based upon existing solutions Details being worked out together with experiments and operations

16

Current Prototype

• WMS AliEn TaskQueue, EDG WMS,

EDG L&B (CNAF)

• CE (CERN, Wisconsin) Globus Gatekeeper, Condor-C,

PBS/LSF , “Pull component” (AliEn CE)

• WN 23 at CERN + 1 at Wisconsin

• SE (CERN, Wisconsin) External SRM implementations

(dCache, Castor), gLite-I/O

• Catalogs (CERN) AliEn FileCatalog, RLS (EDG),

gLite Replica Catalog

• Data Scheduling (CERN) File Transfer Service (Stork)

• Data Transfer (CERN, Wisc) GridFTP

• Metadata Catalog (CERN) Simple interface defined

• Information & Monitoring (CERN, Wisc) R-GMA

• Security VOMS (CERN), myProxy,

gridmapfile and GSI security

• User Interface (CERN & Wisc) AliEn shell, CLIs and APIs, GAS

• Package manager Prototype based on AliEn PM

17

Summary, plans

• Most Grid systems (including LCG2) are batch-job production oriented, gLite addresses distributed analysis Most likely will co-exist, at least for a while

• A prototype exists, new services are being added: Dynamic accounts, gLite CEmon, Globus RLS, File Placement

Service, Data Scheduler, fine-grained authorization, accounting…

• A Pre-Production Testbed is being set up more sites, tested/stable services

• First release due end of March 2005 Functionality freeze at Christmas Intense integration and testing period from January to March 2005

• 2nd release candidate: November 2005 May: revised architecture doc, June: revised design doc