Post on 20-Jan-2016
HEP Computing Status
Sheffield UniversityMatt Robinson
Paul Hodgson
Andrew Beresford
Interactive Cluster
• 30 self built linux boxes• AMD Athlon XP cpu’s, 256/512 meg ram• OS Scientific Linux 303• 100 megabit network• Use NIS for authentication, NFS mount /home etc• System install using kickstart + post install scripts• Separate backup machine• 15 Laptops mostly dual boot• Some MAC’s and one Windows Box• 3 Disk servers mounted as /data1 /data2 etc (few TB)
Batch Cluster
• 100 cpu farm Athlon XP 2400/2800• OS Scientific Linux 303• NFS mounted /home and /data• OpenPBS batch system for job submission• Gigabit Backbone with 100 MBit to worker nodes• Disk server provides 1.3 TB as /data Raid5• Entire cluster assembled in house from OEM components
for less than 50k• Hard part was finding air-conditioned room with sufficient
power
Cluster Usage
Software
• PAW, CERNLIB etc• Geant4• ROOT• Atlas 10.0.1• FLUKA• ANSYS, LS-DYNA
Comments - Issues
• Have tightened up security in last year• Strict firewall policy, limited machine exemption• Blocking scripts prevent ssh access after 3
authentication failures within 1 hour• Cheap disks allow construction of large disk
arrays• Very happy with SL3 for desktop machines• Use FC3 for Laptops – 2.6 kernel
The Sheffield LCG Cluster
Division of Hardware• 162 x AMD Opteron 250 (2.4
GHz)• 4 GB RAM/box (2 GB/CPU)• 72 GB U320 10K RPM local
SCSI disk• Currently running 32 bit
SL303 for maximum compatibility with grid.
• ~2.5 TB storage for experiments.
• Middleware: 2.4.0• Probably the most purple
cluster in the grid.
Looking Sinister
Status
Usage so far
• We can take quite a bit more.
Monitoring
• Ganglia with modified webfrontend to present queue information
Installation
• Service nodes connected to VPN and Internet
• PXE Installation via VPN allows complete control of dhcpd and named
• RedHat kickstart + post install script
• ssh servers not exposed
• RGMA always the hardest part
• Stumbled across routing rules.
• WN install takes about 30 minutes, can do up to 40 simultaneously.
Future plans
• Keep up with middleware updates
• Increase available storage as required in
~3-4 TB steps
• Fix things that break
• Try not to mess anything up by screwing around
• Look toward operating with 64 bit OS.
Matt Robinson:Matt Robinson: