LCG & EGEE Status & Overview GridPP9 February 4 th 2004 Tony.Cass@ CERN .ch

17
LCG & EGEE Status & Overview GridPP9 February 4 th 2004 Tony.Cass@CERN .ch

description

LCG & EGEE Status & Overview GridPP9 February 4 th 2004 Tony.Cass@ CERN .ch. Agenda. LCG LHCC Review Area Status Applications Fabric Grid Deployment Grid Technology LCG & GridPP2 EGEE @ CERN. LCG – LHCC Review. LHCC Comprehensive review of the project, 24 th /25 th November. - PowerPoint PPT Presentation

Transcript of LCG & EGEE Status & Overview GridPP9 February 4 th 2004 Tony.Cass@ CERN .ch

Page 1: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

LCG & EGEE Status & Overview

GridPP9

February 4th 2004

[email protected]

Page 2: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

2 [email protected]

Agenda LCG

– LHCC Review– Area Status

» Applications» Fabric» Grid Deployment» Grid Technology

– LCG & GridPP2 EGEE @ CERN

Page 3: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

3 [email protected]

LCG – LHCC Review LHCC Comprehensive review of the project,

24th/25th November.– See http://agenda.cern.ch/fullAgenda.php?

ida=a035729 Preceded by

– Application Area as part of overall experiment software planning, 4th September.

» See http://agenda.cern.ch/fullAgenda.php?ida=a032308,

– LCG Internal review of the Grid & Fabric areas, 17th-19th November.

» See http://agenda.cern.ch/fullAgenda.php?ida=a035728

Page 4: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

4 [email protected]

LCG — Review Conclusions The LHCC noted significant progress: “It is

realistic to expect the LCG project to have prototyped, built and commissioned the initial LHC computing environment”.

The LHCC noted the good progress in the Applications Area.

No problems for Fabrics! Usual worries about Grid deployment,

middleware development and middleware directions (ARDA), but the review committee considered that the project is/was steering appropriate course.

GridPP funded manpower is a substantial factor behind the progress noted by the LHCC Reviewers!

Page 5: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

5 [email protected]

LCG — Applications Area From the Applications Area report for Q4, 2003:

– “The applications area in this quarter continued to move through a period in which rapid-paced development and feedback-driven debugging are giving way to consolidation of the software in a number of areas and increased direct support for experiment integration.”

Internal Applications Area review in October prior to LHCC review.– Review report reflected in AA plans for 2004.– In particular, recommendation for closer relationship

with ROOT team being followed up in area of dictionary & maths libraries.

» SEAL and ROOT teams developing proposed workplans for consideration in Q1 this year.

Page 6: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

6 [email protected]

LCG — Applications Area POOL

– passed major integration and deployment milestone with production use by CMS: millions of events written per week;

– no longer on CMS critical path to Data Challenge readiness, a major success for the POOL team and CMS.

Simulation project– completed important milestones (initial cycle of EM physics

validation), drew close to completing others (revisiting of physics requirements, hadronic physics validation), and made an important clarification of the objectives and program of the generic simulation framework subproject

– Maybe not directly Grid related, but LHCC review “underlined the importance of supporting the Monte Carlo generator codes for the experiments.”

Other items– SEAL and POOL now available for Windows; initial PI program

essentially complete; ARDA RTAG report.

Page 7: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

7 [email protected]

LCG — Fabric Area Extremely Large Farm management system expanding

its control of the CERN fabric– quattor management of CPU servers being extended to disk &

tape servers (including CASTOR configuration). Disk configuration stored in CDB using HMS.

– Lemon: EDG/WP4 repository in production since Sept– LEAF: new State Management System being used to move

systems into/out of production and drive, e.g., kernel upgrades.– All computer centre machines registered in quattor/CDB.– Use of ELFms tools, particularly quattor, for management of

experiment farms is under discussion (and test). CERN Computer Centre upgrade continues.

– Substation civil engineering almost complete; electrical equipment starts to arrive in March.

– RHS of machine room upgraded: Major equipment migration to free the LHS is in preparation!

Page 8: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

8 [email protected]

LCG — Fabric Area Phase II purchasing process starting now

– See talk at http://agenda.cern.ch/fullAgenda.php?ida=a036781.– Long lead time before 2006 purchases given CERN rules.

» Install early in 2006. Volumes are large, especially for disk servers.

– Plan to qualify suppliers of “generic PCs”» “Intel-like architecture” about the only requirement» Selection criteria for companies is the major consideration at

present. Plan careful evaluation of potential bidders in Autumn.

– Expect CPU servers to be commodity equipment as at present.– Disk server area is major concern.

» Significant problems with EIDE servers in 2003. Reasons not fully understood (yet!). Procedures and control much improved since November (end of 2003 data taking).

» Still, these servers are significantly cheaper than alternatives. We need to be able to deal with hardware failures in this area.

CMS and ATLAS are watching our plans closely.– Common suppliers for Tier0/1 and online farms?

Page 9: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

9 [email protected]

LCG — Grid Deployment LCG1 service now covers 28 sites. Major

productions for ATLAS and CMS during Christmas holidays.– CMS: 600K events; sites mainly in Italy & Spain.– ATLAS 14K events (although only 75 jobs).– US/ATLAS sent requests for job execution to LCG-1

from the US Grid3 infrastructure. After some work, events were successfully generated using LCG-1 sites CERN, Turin and Brookhaven with the output data staged at the University of Chicago and registered in the Globus RLS.

LCG2 service is on smaller number of sites– Avoid configuration and stability issues– Require commitment of sufficient support effort and

compute resources

Page 10: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

10 [email protected]

LCG2 Core sites and commitments

Site Immediate LaterCERN 200 1200

CNAF 200 500

FNAL 10 ?

FZK 100 ?

Nikhef 124 180

PIC 100 300

RAL 70 250

Taipei 60 ?

Russia 30 50

Prague 17 40

Budapest 100 ?

Totals 864(+147) >2600(+>90)

Initial LCG-2core sites

Other firmcommitments

Will bring in the other 20LCG-1 sites as quickly as possible

Page 11: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

11 [email protected]

LCG2 functionality General

– CondorG – » new grid manager (critical, now in official VDT); gahp-server (critical, local,

with Condor team now); scheduler, memory usage (with Condor team)– Globus -

» RM wouldn't work behind the firewall; prevent occassional hangs of CE; number of errors in the handling of return status from various functions

» Refrained from putting all fixes into 2.2.x knowing that they would be included in 2.4.3.

– RB – new WP1 fixed number of LCG-1 problems (reported by LCG)» above this we fixed (with WP1 team) memory leaks in Interlockd, network

server & filelist problem– CE – memory leaks

Installation– WN installation independent from LCFGng (or other tools)– Still required for service nodes

Still require outbound IP connectivity from WN’s– Work to be done to address in Replica Manager– Add statement to security policy to recognise the need – but limit it –

applications must not rely on this

Page 12: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

12 [email protected]

LCG2 Status Generally OK, but delayed by problems in SE

area. Intention was to use SRM interfaces for Castor

and dCache, but there are still problems… Agreed now to continue for the present with

gridftp access to storage.– dcache will be available as a space manager for sites

without one, but not using the SRM interface initially.

Joint testing with ALICE starts this week.

Page 13: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

13 [email protected]

LCG — Grid Technology Key topic has been, of course, the direction of

Grid Middleware. ARDA started as an RTAG in the applications

area to define the completion of the Physicist Interface programme (distributed analysis). Much discussion, though, on the Grid Middleware and impact on the DA framework.

ARDA workshop held at CERN, January 21st/22nd to plan post RTAG developments.– See report later this afternoon

» and you’ve just heard from Tony!

– also http://agenda.cern.ch/fullAgenda.php?ida=a036745.

Page 14: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

14 [email protected]

LCG & GridPP2 Remember: delay of LHC to April 2007 means LCG

and GridPP are now out of phase. LCG phase II starts only in January 2006.– Work programme and plan both exist, however, and

there is a shortfall in resources, principally lack of staff.– Strong desire to maintain UK contribution (and influence)

and links between GridPP & LCG.– UK message that clear case must be made is understood.

Discussions with new CERN management are starting.– £1M would support 5FTE over the 3 years (c.f. 25+ now

and 10 in the GridPP2 proposal). Work areas to be agreed.

Existing GridPP funded staff have ~1 year left before the end of their contracts. There will be a review of post effectiveness similar to that just completed for other GridPP posts.

Page 15: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

15 [email protected]

EGEE @ CERN See Neil for the high level politics! Implementation Plan

– Initial service will be based on the LCG infrastructure (this will be the production service, most resources allocated here)

» Cross membership of LCG & EGEE project management boards.– Also will need a certification test-bed system

» For debugging and problem resolving of the production system– In parallel must deploy a development service

» Runs the candidate next software release for production» Treated as an reliable facility (but with less support than the

production service) EGEE All Activities Meeting, January 13th/14th

– see http://agenda.cern.ch/fullAgenda.php?ida=a036343. Two areas managed by CERN

– JRA1/Middleware: Frederic Hemmer– SA1/Operations: Ian Bird– Siginificant effort in recruitment area over last 2 months. Four

boards held. 19 job offers made to date. CERN support for at least one person prior to April project start.

Page 16: LCG & EGEE Status & Overview GridPP9 February 4 th  2004 Tony.Cass@ CERN .ch

16 [email protected]

Conclusion

Good progress in all areas :->

As ever, strongly supported by GridPP funded effort at CERN.