C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles...

23
C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002

Transcript of C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles...

Page 1: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 1

Status of European DataGrid

Charles Loomis

CNRS/LAL

NorduGrid Workshop

May 23, 2002

Page 2: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 2

Introduction & Outline

European DataGrid

3-year EU-funded project

Goals:—develop grid middleware

—deploy onto working testbed

—demonstrate grid technology with working applications

Strong application component unique!

Current SoftwareMachine Tour

Status

TestbedDeployed software

Present & Future Sites

Near-term DevelopmentsEDG v1.2

Latest Globus Release

EDG License

Longer-term DevelopmentsTesting & Support Infrastructure

Enhanced EDG Features

Interoperability

Further Information

Page 3: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 3

User Interface

Lightweight access to grid

Access from Laptop

No host certificate needed.

Some question about CRLs.

Limitations

Cannot run ftp daemon here.

Services:

UserInterface (CLI)

Globus GSI

globus-url-copy (client)

Development libraries—BrokerInfo

—Replica Catalog APIs

—GDMP client interface

Page 4: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 4

Resource Broker

Finds resources, submits & tracks jobs:

Heavyweight machine.

Talks to RC and MDS.

Acts as users’ network presence.

Talks to proxy server.

Bottleneck

Can replicate, but enough?

Services:

Resource Broker

JobSubmission Service—Condor-G below

Information Index

Logging & Bookkeeping

GSI-ftp daemon

Page 5: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 5

Computing Element

Accepts & Executes Jobs:

Gatekeeper—acts as public interface to computing

resources

Worker Node(s)—provides all software needed for

applications

—accessible via batch system•PBS, LSF, …

Services:

Gatekeeper

GSI-ftp daemon

GIIS/GRIS

Page 6: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 6

Storage Element

Generic interface to storage:

Gatekeeper—should go away

GSIFTP

RFIO

Services:

Gatekeeper

GDMP

GSI-ftp daemon

RFIO daemon

Page 7: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 7

Replica Catalog

Provides information about replicas:

Catalog Service—accessed via RB or directly

Services:

LDAP

GIIS/GRIS

Page 8: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 8

Authorization/Authentication System

All based on GSI (PKI):

Certification Authorities

Virtual Organization Servers

Services:

LDAP for VO servers

various SW for CA’s

mkgridmap generation software

Page 9: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 9

Software Distribution & Installation

Storage:

Package repository

CVS server

Distribution

HTTP downloads

wget with rpm lists

most primitive link in chain

Installation

LCFG (LCFG-lite)

Only works for RH6.2

Page 10: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 10

Software on “Production” Testbed

Stopped work on 1.1-series to focus on 1.2.

Deployed v1.1.4+patches version not uniform

Significant functionality missing for applications.—Replica Management

—Access to mass storage.

Difficult for middleware to support this version.

Testbed works, but…

Known stability problems:—Information Index dies regularly.

—Broker needs to be restarted often.

Support limited—Maintenance reduced to life support.

—Effort for new sites limited to “available effort.”

Page 11: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 11

Production Testbed Sites

Production Sites

Most have dedicated hardware.—Lyon running on main batch system.

Typically few to 10’s of machines.

LCFG for Install. & Config.—Lyon again exception.

Limitations to Expansion

Info. systems unreliable.—manual reg. not scalable or dynamic

How to add countries w/o CA?—OK for users (CNRS CA)

—Not OK for host certificates.

Site Location

Catania Catania (I)

CC-IN2P3 Lyon (F)

CERN Geneva (CH)

CNAF Bologna (I)

Imperial College

London (UK)

MSU Moscow (Russia)

NIKHEF Amsterdam (NL)

Padova Padova (I)

RAL Rutherford (UK)

Torino Torino (I)

Croatia

Taiwan

United States

Page 12: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 12

EDG Release 1.2

New Features in 1.2 Release (10)

Replica Management API—first implementation has limited API

Access to Mass Storage Systems—authorization linked to user account mapping

Auto-resubmission of failed jobs.—will help with stability problems (but is not a solution!)

Current Problems

GASS cache file locking problems (failed job submissions)

OpenLDAP timeout (II hangs; complete loss of MDS information)

FTree interfering with gatekeeper. (Causes crashes; failed submissions)

Page 13: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 13

Expected Schedule

13 14 15 16 17

20 21 22 23 24

27 28 29 30 31

3 4 5 6 7

10 11 12 13 14

17 18 19 20 21

May

June

ITeam at CERN 1.2 alpha

RAL/CNAF Test 3 SitesRefine alpha

GASS/MDS Prbs.

JJ/Ingo Tests <1% error rateApp. Testing

App. Testing

1.3 codelicense info

DeploymentDecision

ESRIN DemoCore SiteDeployment

General Deployment

1.2 beta

Page 14: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 14

Upgrade to Latest Globus Release

EDG Globus beta-21 is based on first Globus2 beta.Includes some patches for security.

Some EDG-specific patches.

(Larger changes for EDG 1.2.)

Upgrade to current Globus2 release depends on:Desire of the applications groups

—Only known critical problem is with file transfers >20min.

Whether it contains fixes for GASS/MDS problems.

When EDG software for release 1.2 is deemed stable.

EDG 2.0 release in fall will be based on Globus2!OGSA being evaluated, but no whole-scale move yet.

Some new EDG software functions as “Web Service”

Page 15: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 15

Testing & Support

Testing Group

Goal: Intensive testing of releases

Provide framework for:—unit tests

—integration tests

—stress tests

Provide material for objective evaluation of software for EU-review.

Use tests for:—check of quality of software

—verification of functionality

—check configuration of new sites

Has started with EDG 1.2 (10).—should have feedback for EDG 1.2

deployment decision

Support Infrastructure

Provide email-based support for both end-users and system administrators.

—ITeam and other experts

—New system administrator group

Tracking & follow-up of problems.

Create “knowledge base” for FAQs and typical problems.

Interact with LCG and CrossGrid to share the support effort.

System in place shortly; fully functional for Testbed2.

Page 16: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 16

EDG Software License

EDG software license will be in BSD family (see EDG website):

OpenSource license.

Developments may be put back into code base.

Allows commercial use of code.

Standard license for most Grid-projects—Exception: ClassAds, Condor-G will be LGPL.

EDG audit of external packages:

Necessary to ensure we can apply our own license.

Necessary to ensure that we properly attribute other groups’ work.

Need to be especially careful with GPL code.—Ensure that core functionality consistent with license.

—LCFG will likely be GPL license rather than the EDG license.

Page 17: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 17

Release Schedule

Moved to iterative releases:

Keep developments compatible.

Provide intermediate checks on progress.

Allow applications to evaluate functionality.

Not all intermediate releases will be deployed!

Release 2.0 is hard deadline; others somewhat flexible.

Details in “Release Plan” document on web site, highlights…

Release

Date

1.1 Jan. 31

1.2 March 31

1.3 May 31

1.4 July 31

2.0 Sept. 30

Page 18: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 18

Release 1.2

General

Emphasis on stability.

Deploy as production release.

Globus

Uses first Globus2 beta (beta-21)

Plus EDG patches.

Workload Management (WP1)

Proxy renewal for long jobs.

Auto-resubmission of failed jobs.

Data Management (WP2)

Replica Manager (first impl.)

GDMP 3.0

Fabric Management (WP4)

Updated LCFG

EDG Gatekeeper (LCAS)

Storage Element (WP5)

Access to existing data in MSS.

Networking (WP7)

Publish network data into MDS.

Page 19: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 19

Release 1.3

General

Autobuild all EDG packages.

Copyright and license for code.

Globus

Update to latest Globus2 release

Workload Management (WP1)

C APIs

MPICH support.

Data Management (WP2)

Replica Manager

Replica Location Service (giggle)

Grid Mon./Info. Services (WP3)

R-GMA deployed in parallel with MDS

Fabric Management (WP4)

EDG JobManager

Storage Element (WP5)

RFIO with GSI

Prototype GridFTP with MSS access.

Networking (WP7)

Network cost function.

Page 20: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 20

Release 1.4

General

Support RH6.2, RH7.2

GLUE Schema

New authorization scheme.

Workload Management (WP1)

Interactive jobs.

Job dependencies.

Triggered file transfers.

Data Management (WP2)

Replica Manager with Optimiser

SpitFire beta release.

Grid Mon./Info. Services (WP3)Better integration of R-GMA.

Unified (GLUE) schema.

Fabric Management (WP4)KickStart translator.

Monitoring & Alarms.

Condor supported.

Storage Element (WP5)DiskManager for disk-only SE.

Testbed (WP6)New authorization scheme.

Networking (WP7)Publication of network metrics.

Page 21: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 21

Release 2.0

General

Support RH6.2, RH7.2, Solaris?

Workload Management (WP1)

Job checkpointing.

Accounting.

Advance reservation.

Data Management (WP2)

Full integration of components.

Grid Mon./Info. Services (WP3)

R-GMA WebServices

Fabric Management (WP4)

HLD templates.

Credential service (LCMAPS).

Storage Element (WP5)

DiskManager access to all HSM.

Reservation, pinning, quotas.

Testbed (WP6)

Laptop based UI machine.

Networking (WP7)

Network cost for all sites.

Page 22: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 22

Interoperability

Working with GriPhyN, PPDG, iVDGL, DataTag, CrossGrid,

First concrete example is GLUE schema.

Places for conflict:

Information systems

Agreed interfaces

Page 23: C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002.

C. Loomis – Status of European DataGrid – May 23, 2002 – 23

Further Information

Interesting web sites:

EDG: http://www.eu-datagrid.org/ —general information about EDG project

—links to all work package web sites

WP6: http://marianne.in2p3.fr/ —support information (contacts, bug reporting, documentation, mailing lists)

—meeting agenda/minutes

—links to source code in CVS; packages in package repository

Bleeding-edge information:

[email protected]

Warning: this is a high-volume list!