HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Post on 20-Jan-2016

217 views 0 download

Tags:

Transcript of HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

HEP Computing Status

Sheffield UniversityMatt Robinson

Paul Hodgson

Andrew Beresford

Interactive Cluster

• 30 self built linux boxes• AMD Athlon XP cpu’s, 256/512 meg ram• OS Scientific Linux 303• 100 megabit network• Use NIS for authentication, NFS mount /home etc• System install using kickstart + post install scripts• Separate backup machine• 15 Laptops mostly dual boot• Some MAC’s and one Windows Box• 3 Disk servers mounted as /data1 /data2 etc (few TB)

Batch Cluster

• 100 cpu farm Athlon XP 2400/2800• OS Scientific Linux 303• NFS mounted /home and /data• OpenPBS batch system for job submission• Gigabit Backbone with 100 MBit to worker nodes• Disk server provides 1.3 TB as /data Raid5• Entire cluster assembled in house from OEM components

for less than 50k• Hard part was finding air-conditioned room with sufficient

power

Cluster Usage

Software

• PAW, CERNLIB etc• Geant4• ROOT• Atlas 10.0.1• FLUKA• ANSYS, LS-DYNA

Comments - Issues

• Have tightened up security in last year• Strict firewall policy, limited machine exemption• Blocking scripts prevent ssh access after 3

authentication failures within 1 hour• Cheap disks allow construction of large disk

arrays• Very happy with SL3 for desktop machines• Use FC3 for Laptops – 2.6 kernel

The Sheffield LCG Cluster

Division of Hardware• 162 x AMD Opteron 250 (2.4

GHz)• 4 GB RAM/box (2 GB/CPU)• 72 GB U320 10K RPM local

SCSI disk• Currently running 32 bit

SL303 for maximum compatibility with grid.

• ~2.5 TB storage for experiments.

• Middleware: 2.4.0• Probably the most purple

cluster in the grid.

Looking Sinister

Status

Usage so far

• We can take quite a bit more.

Monitoring

• Ganglia with modified webfrontend to present queue information

Installation

• Service nodes connected to VPN and Internet

• PXE Installation via VPN allows complete control of dhcpd and named

• RedHat kickstart + post install script

• ssh servers not exposed

• RGMA always the hardest part

• Stumbled across routing rules.

• WN install takes about 30 minutes, can do up to 40 simultaneously.

Future plans

• Keep up with middleware updates

• Increase available storage as required in

~3-4 TB steps

• Fix things that break

• Try not to mess anything up by screwing around

• Look toward operating with 64 bit OS.

Matt Robinson:Matt Robinson: