HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

14
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford

Transcript of HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Page 1: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

HEP Computing Status

Sheffield UniversityMatt Robinson

Paul Hodgson

Andrew Beresford

Page 2: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Interactive Cluster

• 30 self built linux boxes• AMD Athlon XP cpu’s, 256/512 meg ram• OS Scientific Linux 303• 100 megabit network• Use NIS for authentication, NFS mount /home etc• System install using kickstart + post install scripts• Separate backup machine• 15 Laptops mostly dual boot• Some MAC’s and one Windows Box• 3 Disk servers mounted as /data1 /data2 etc (few TB)

Page 3: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Batch Cluster

• 100 cpu farm Athlon XP 2400/2800• OS Scientific Linux 303• NFS mounted /home and /data• OpenPBS batch system for job submission• Gigabit Backbone with 100 MBit to worker nodes• Disk server provides 1.3 TB as /data Raid5• Entire cluster assembled in house from OEM components

for less than 50k• Hard part was finding air-conditioned room with sufficient

power

Page 4: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Cluster Usage

Page 5: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Software

• PAW, CERNLIB etc• Geant4• ROOT• Atlas 10.0.1• FLUKA• ANSYS, LS-DYNA

Page 6: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Comments - Issues

• Have tightened up security in last year• Strict firewall policy, limited machine exemption• Blocking scripts prevent ssh access after 3

authentication failures within 1 hour• Cheap disks allow construction of large disk

arrays• Very happy with SL3 for desktop machines• Use FC3 for Laptops – 2.6 kernel

Page 7: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

The Sheffield LCG Cluster

Page 8: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Division of Hardware• 162 x AMD Opteron 250 (2.4

GHz)• 4 GB RAM/box (2 GB/CPU)• 72 GB U320 10K RPM local

SCSI disk• Currently running 32 bit

SL303 for maximum compatibility with grid.

• ~2.5 TB storage for experiments.

• Middleware: 2.4.0• Probably the most purple

cluster in the grid.

Page 9: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Looking Sinister

Page 10: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Status

Page 11: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Usage so far

• We can take quite a bit more.

Page 12: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Monitoring

• Ganglia with modified webfrontend to present queue information

Page 13: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Installation

• Service nodes connected to VPN and Internet

• PXE Installation via VPN allows complete control of dhcpd and named

• RedHat kickstart + post install script

• ssh servers not exposed

• RGMA always the hardest part

• Stumbled across routing rules.

• WN install takes about 30 minutes, can do up to 40 simultaneously.

Page 14: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Future plans

• Keep up with middleware updates

• Increase available storage as required in

~3-4 TB steps

• Fix things that break

• Try not to mess anything up by screwing around

• Look toward operating with 64 bit OS.

Matt Robinson:Matt Robinson: