Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

13
Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008

Transcript of Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

Page 1: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

Northgrid Status

Alessandra FortiGridpp20 Dublin12 March 2008

Page 2: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

Layout

• General status• Manpower• Other VOs • Atlas shifts• Sites news• Conclusions

Page 3: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

General Status (1)

90%0.74.5182.5DPMyesyesupgradingGlite3.1

Sheffield

83%19.21622160dcacheworking on it

installing

upgradingGlite3.1

Manchester

80%1.913592dcacheworking on it

installingSL4Glite3.1

Liverpool

82%39.680476.2dcache -> DPMyesyesSL4Glite3.1

Lancaster

Average availabili

ty

Used Storage(TB)

Storage (TB)

CPU (kSI2K)

SRM brand

Space Tokens

SRM2.2OS

MiddlewareSite

Page 4: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

General Status (2)

Page 5: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

General Status (3)

Page 6: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

Man power

• Lancaster:– Brian Davies– Matt Doidge, Peter Love

• Liverpool:– Pawel Trepka– Rob Fay, John Bland

• Manchester: – Colin Morey– Owen McShane, Stuart Wild, Sergey Dolgodobrov

• Sheffield– Dominic Wilson– Elena Korolkova, Matt Robinson

Page 7: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

Other VOs

• Northgrid VO has been created for VO-less users and is being installed.– Some users have already subscribed it– Users now in gridpp will be moved to

northgrid

• Other VOs are running on our systems – ~24 enabled between all sites – hone, dzero and biomed leading the cpu

usage of ‘Other VOs’

Page 8: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

Atlas Shifts

• Northgrid among the biggest supplier of shifters: 5 people from all sites already involved– Carl Gwilliams expert shifter– Peter Love and Alessandra Forti: senior

shifters– Mark Hodgkinson, Paul Hogson: trainees

• Benefits are evident: site managers have an inside perspective of atlas problems and atlas can benefit from sys admins shifters feedback.

Page 9: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

Lancaster news

• UKLight link to RAL had problems, affected Atlas upload into RAL  (Cause: bad 10G card with core Ciena kit in Reading)– http://www.gridpp.rl.ac.uk/blog/2008/02/29/a-week-of-woe-on-the-lancaste

r-link/

• Power cut toasted dCache system disk– Forced a fresh install and upgrade to SL4– dCache install not smooth

• Migrating from dCache to DPM– DPM installation trivial, up and running with no problems– Atlas production now on DPM– space tokens in place– data migration underway

• This weekend, FTS problems from RAL (diagnosis ongoing)– Active transfers still not normal: http://tinyurl.com/25fhqm

• New data centre, hoarding going up on site  http://www.lancs.ac.uk/depts/estates/projects/iss.htm

Page 10: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

Liverpool News

• Cluster upgraded to SL4• Working on the dcache upgrade and

enabling Space Tokens.– dcache installation not smooth

• Installed a new more powerful CE and SE• Upgraded the rack software servers to

250GB RAID1 to cope with the >100GB size of the ATLAS code.

• Still testing Puppet as preferred fabric management solution.

Page 11: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

Manchester News

• Manchester setup has been completely reorganised– cfengine configuration rewritten according to tasks and not

to host type– All the quick and dirty extra steps have been cleaned up

and are now handled by cfengine– Test and trash machines have been reorganised and

installation doesn’t require any special handling in cfengine or out.

– All the certificates have been renewed in one go thanks to the new bulk request/renewal script

– Still in the process of upgrading the first cluster as dcache proved to be more complicated than it should be. The WN+pools and CEs are ready to go though.

• Plan to go ahead and deal with dcache head node more slowly

• Tickets from GGUS a sorer point than ever– GGUS opens a ticket in RT at each reply…

Page 12: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

Sheffield news

• Benefited by the staff change– Elena is located in the physics department.

• Upgraded to SRM 2.2• Enabled Space Tokens for atlas• Added 2.5 TB of storage• Problems with apel accounting due to apel

using the wrong batch system• Problems with biomed jobs hanging for >70

because there is no time out when a remote server doesn’t reply.– Still handled with manual monitoring

Page 13: Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.

Conclusions

• Northgrid is in a healthy state• Upgrades to SL4, SRM 2.2 and enabling

space tokens are going on. – We should make it for the deadlines

• The main problems at the moment– sys admins turn over– dcache installation/upgrade and setup is not

smooth

• Well integrated with atlas and good exploitation from other users communities.