GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A...

Post on 05-Jan-2016

214 views 0 download

Tags:

Transcript of GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A...

gLite – An Outsider’s View

Stephen BurkeRAL

January 31st 2005 gLite overview

Introduction

• A personal view of the current situation– Asked to be provocative!– Some things may be wrong

• Accurate information can be hard to obtain

• History• Current situation• Future

January 31st 2005 gLite overview

What was supposed to happen

• The original idea was to harden/re-engineer the deployed LCG middleware– Short development cycles driven by user

feedback– No “big bang” releases– No major new development

• Autumn 2003: the ARDA RTAG recommended a new architecture based on AliEn– Set up a prototype system quickly– Rapid development endorsed

January 31st 2005 gLite overview

What actually happened

• EGEE started well after EDG finished– Large gap: December 2003 -> April 2004

• EDG infrastructure (cvs, build system, bug tracking, developer guidelines, testbeds, …) all scrapped– New system may be better (?), but it took ~7 months to

put it in place

• JRA1 “prototype” was essentially AliEn– Only two sites, of which one was hardly supported– ARDA project was set up and started using the prototype

• The members (some with no LCG experience) got used to AliEn

• LCG forged ahead with middleware improvements– Middleware quality/stability much improved– But the experiments are still unhappy

January 31st 2005 gLite overview

AliEn -> gLite

• JRA1 wrote architecture and design documents for a major middleware development project– EDG experience suggests it will take years, not months– Not obviously driven by NA4 or SA1 requirements– AliEn pushed aside

• RB will support pull model as well as push– Migration to web services – everyone seems to like this,

but what is the real gain in the short term?• Web service code mostly not available yet anyway

• Big bang release is back!– Hardly any testing so far by SA1/LCG– Or most users– Information is limited

January 31st 2005 gLite overview

Testbeds

• EDG testbed(s) and ITeam were successful– Both effectively lost in EGEE

• “Prototype” testbed not very useful– Effectively just one site, few machines– Not really a prototype – misled people about what to

expect• JRA1 testing testbed and test team effectively co-

opted for integration– Reduced resources for testing– Few sites, limited manpower– Already ~600 bugs in savannah, growing rapidly

• 260 closed, another 130 fixed and being tested• EDG had ~2500 bugs by the end

• SA1 PPS just starting at the start of 2005– Role still unclear

January 31st 2005 gLite overview

Workload management

• Development of the EDG/LCG RB– Seems to be largely backward-compatible– Only user docs so far are the EDG manuals

• Not clear what new features are available– Or whether LCG mods are included

• Not AliEn!– Support for pull model via new CEMon

component

• Still uses BDII with GLUE schema– Should change to R-GMA (?)

January 31st 2005 gLite overview

Data Management

• Very complex design, largely new code– No real user documentation, just javadoc– Mostly not delivered yet– Not clear how much will be in RC1

• Metadata activities also in ARDA and GridPP• Still seem to be developing the architecture

– Particularly the interaction with the WMS– WMS hedging its bets

• Supports both (gLite and LCG) systems

• LCG has also been developing DM tools– New file catalogue on its way– How do they relate to gLite?

January 31st 2005 gLite overview

Data Storage

• gLite has no development of its own, relying on SRM projects

• EDG-SE was not stable enough for production– Still in development?

• dCache almost ready, but has taken ~18 months and still has many bugs– Support unclear

• New LCG Disk Pool Manager– Only an alpha version so far

• Is storage management really this hard?– Will the “classic SE” ever die?!

January 31st 2005 gLite overview

R-GMA

• Should be an information system– But both LCG and gLite still use BDII

• Some user documentation available• gLite version is fairly backward

compatible with LCG version– No web services yet– gLite version getting “fast track” into LCG

• Still few users– But needed for APEL accounting

January 31st 2005 gLite overview

Security

• “Security must be built in from the start”– So it gets a separate activity!

• EGEE security requirements document nearly identical to EDG D7.5 from May 2002– Which was mostly not implemented …

• Both LCG and gLite intend to use VOMS– But still not yet integrated with most middleware– No real strategy for how to use it?– Who “owns” VOMS?

January 31st 2005 gLite overview

Others

• Package management– gLite is developing a software package

manager• and so is LCG!

– May be useful, no experience yet

• GAS– Came with AliEn, not clear if anyone

wants it

January 31st 2005 gLite overview

Operational issues

• System Design– Neither SA1 nor JRA1 has anyone designing how the

complete system should work

• Configuration– The existing system has a very complex configuration

which is the source of many problems– Being addressed in JRA1, but not clear if it will really make

things better

• Stability and debugging– In a big Grid some things are always broken– Error messages and logging must allow problems to be

traced– Services need to be fault-tolerant– Not clear if JRA1 is addressing this

January 31st 2005 gLite overview

What happens next?

• Code being delivered to SA1, will run on PPS• All serious bugs supposed to be fixed by March

– EDG experience is that it took >1 year to go from code delivery to production use – some things never made it!

• Migration strategy?– Hard if you don’t know what will work– LCG has its own developments, especially in data management– New R-GMA is largely backward-compatible

• And not critical yet– New RB seems similar to current version

• At least in push mode• ALICE (and LHCb?) want AliEn

– Data management is completely different• Big bang releases

– Code has now been branched– Will developers be keen to fix bugs in the “old” branch?– “Wait for the next version, it will all be fixed then”!