Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.
-
Upload
carlos-oneal -
Category
Documents
-
view
219 -
download
4
Transcript of Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.
Northgrid Status
Alessandra FortiGridpp20 Dublin12 March 2008
Layout
• General status• Manpower• Other VOs • Atlas shifts• Sites news• Conclusions
General Status (1)
90%0.74.5182.5DPMyesyesupgradingGlite3.1
Sheffield
83%19.21622160dcacheworking on it
installing
upgradingGlite3.1
Manchester
80%1.913592dcacheworking on it
installingSL4Glite3.1
Liverpool
82%39.680476.2dcache -> DPMyesyesSL4Glite3.1
Lancaster
Average availabili
ty
Used Storage(TB)
Storage (TB)
CPU (kSI2K)
SRM brand
Space Tokens
SRM2.2OS
MiddlewareSite
General Status (2)
General Status (3)
Man power
• Lancaster:– Brian Davies– Matt Doidge, Peter Love
• Liverpool:– Pawel Trepka– Rob Fay, John Bland
• Manchester: – Colin Morey– Owen McShane, Stuart Wild, Sergey Dolgodobrov
• Sheffield– Dominic Wilson– Elena Korolkova, Matt Robinson
Other VOs
• Northgrid VO has been created for VO-less users and is being installed.– Some users have already subscribed it– Users now in gridpp will be moved to
northgrid
• Other VOs are running on our systems – ~24 enabled between all sites – hone, dzero and biomed leading the cpu
usage of ‘Other VOs’
Atlas Shifts
• Northgrid among the biggest supplier of shifters: 5 people from all sites already involved– Carl Gwilliams expert shifter– Peter Love and Alessandra Forti: senior
shifters– Mark Hodgkinson, Paul Hogson: trainees
• Benefits are evident: site managers have an inside perspective of atlas problems and atlas can benefit from sys admins shifters feedback.
Lancaster news
• UKLight link to RAL had problems, affected Atlas upload into RAL (Cause: bad 10G card with core Ciena kit in Reading)– http://www.gridpp.rl.ac.uk/blog/2008/02/29/a-week-of-woe-on-the-lancaste
r-link/
• Power cut toasted dCache system disk– Forced a fresh install and upgrade to SL4– dCache install not smooth
• Migrating from dCache to DPM– DPM installation trivial, up and running with no problems– Atlas production now on DPM– space tokens in place– data migration underway
• This weekend, FTS problems from RAL (diagnosis ongoing)– Active transfers still not normal: http://tinyurl.com/25fhqm
• New data centre, hoarding going up on site http://www.lancs.ac.uk/depts/estates/projects/iss.htm
Liverpool News
• Cluster upgraded to SL4• Working on the dcache upgrade and
enabling Space Tokens.– dcache installation not smooth
• Installed a new more powerful CE and SE• Upgraded the rack software servers to
250GB RAID1 to cope with the >100GB size of the ATLAS code.
• Still testing Puppet as preferred fabric management solution.
Manchester News
• Manchester setup has been completely reorganised– cfengine configuration rewritten according to tasks and not
to host type– All the quick and dirty extra steps have been cleaned up
and are now handled by cfengine– Test and trash machines have been reorganised and
installation doesn’t require any special handling in cfengine or out.
– All the certificates have been renewed in one go thanks to the new bulk request/renewal script
– Still in the process of upgrading the first cluster as dcache proved to be more complicated than it should be. The WN+pools and CEs are ready to go though.
• Plan to go ahead and deal with dcache head node more slowly
• Tickets from GGUS a sorer point than ever– GGUS opens a ticket in RT at each reply…
Sheffield news
• Benefited by the staff change– Elena is located in the physics department.
• Upgraded to SRM 2.2• Enabled Space Tokens for atlas• Added 2.5 TB of storage• Problems with apel accounting due to apel
using the wrong batch system• Problems with biomed jobs hanging for >70
because there is no time out when a remote server doesn’t reply.– Still handled with manual monitoring
Conclusions
• Northgrid is in a healthy state• Upgrades to SL4, SRM 2.2 and enabling
space tokens are going on. – We should make it for the deadlines
• The main problems at the moment– sys admins turn over– dcache installation/upgrade and setup is not
smooth
• Well integrated with atlas and good exploitation from other users communities.