Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

20
Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech

Transcript of Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Page 1: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Oxford PP Computing Site Report

HEPSYSMAN

28th April 2003

Pete Gronbech

Page 2: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

General Strategy

• Approx 200 Windows 2000 Desktop PC’s with Exceed used to access central Linux systems

• Digital Unix and VMS phased out for general use.

• Red Hat Linux 7.3 is becoming the standard

Page 3: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Network Access

CampusBackboneRouter

Super Janet 4 2.4Gb/s with Super Janet 4

OUCSFirewall

depts

depts

PhysicsFirewall

PhysicsBackboneRouter

100Mb/s

1Gb/s

100Mb/s

1Gb/s

BackboneEdgeRouter

depts

100Mb/s

100Mb/s

100Mb/s

depts

100Mb/s

BackboneEdgeRouter

1Gb/s

Page 4: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Physics Backbone Upgrade to Gigabit Autumn 2002

desktop

ServerGb/s switch

PhysicsFirewall

PhysicsBackboneRouter

1Gb/s

1Gb/s

100Mb/s

100Mb/s

ParticlePhysics

desktop

100Mb/s

100Mb/s

1Gb/s

100Mb/s

Clarendon Lab

1Gb/s

LinuxServer

Win 2kServer

Astro

1Gb/s

1Gb/s

Theory

1Gb/s

Atmos

1Gb/s

Page 5: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

pplx1 morpheus pplxfs1 pplxgen pplx21Gb/s

ppcresst1 ppcresst2

ppatlas1 atlassbc

ppminos1 ppminos2

grid pplxbatch pptb01 pptb02

Grid Development

pplx3(SNO)

ppnt117(HARP)

CDF

minos DAQ

Atlas DAQ

cresst DAQ

General Purpose Systems

tblcfg tbse01 tbce01

RH7.3

Fermi7.3.1

RH7.3

RH7.3

RH7.1

RH7.1

RH7.1

RH7.3

RH7.3

RH6.2

RH6.2

RH7.1

RH7.1

RH7.3

RH6.2

RH6.2

RH6.2

RH6.2

RH6.2

RH6.2

PBS Batch FarmAutumn 2002

4*Dual 2.4GHz systems

RH7.3

RH7.3

RH7.3

RH7.3

edg uisam testing

Autumn 2002

Page 6: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

pplxfs1 pplxgen pplx21Gb/s

General Purpose Systems

RH7.3

RH7.3

RH6.2

PBS Batch FarmAutumn 2002

4*Dual 2.4GHz systems

RH7.3

RH7.3

RH7.3

RH7.3

Page 7: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Zero - D X- 3i SCSI -IDE RAID12 * 160GB Maxtor Drives

Supplied by Compusys

This proved to be a disaster and was rejected in favour of bare scsi disks which we internally mounted in our rack mounted file server

Page 8: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

The Linux File Server: pplxfs18*146GB SCSI disks

Page 9: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

General Purpose Linux Server : pplxgen

pplxgen is a Dual 2.2GHz Pentium 4 Xeon based system with 2GB ram. It is running Red Hat 7.3It was brought on line at the end of August 2002 to share the load with pplx2 as users migrated off al1 (the Digital Unix Server)

Page 10: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

PP batch farm running Red Hat 7.3 with Open PBS can be seen below pplxgen

This service became fully operational in Feb 2003.

Page 11: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

pplx1 (new)

morpheus 1Gb/s

grid pplxbatch

pptb01pptb02

Grid Development

CDF

tblcfg tbse01 tbce01

Fermi7.3.1

RH7.1

RH6.2

RH6.2

RH6.2

RH6.2

RH6.2

RH6.2

edg uisam testing

matrix

Fermi7.3.1

node9

Fermi7.3.1

cdfsam

Fermi7.3.1

node1

Fermi7.3.1

Fermi7.3.1

Fermi7.3.1

Fermi7.3.1

Fermi7.3.1

Fermi7.3.1

Fermi7.3.1

Fermi7.3.1

Fermi7.3.1

RH6.1

RH7.3

tbwn01 tbwn02

RH6.2

tbgen01

FEBRUARY 2003

LHCB MC

RH6.2

Page 12: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Grid development systems. Including EDG software testbed setup.

Page 13: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

New Linux Systems

Morpheus is an IBM x3708 way SMP 700MHz Xeonwith 4GB RAM and1TB Fibre Channel disksInstalled August 2001

Purchased as part of a JIF grantfor the cdf group

Runs Red Hat 7.1

Will use cdf software developed atFermilab and here to process data from the cdf experiment.

Page 14: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Tape Backup is provided bya Qualstar TLS4480tape robot with 80 slots and Dual Sony AIT3 drives.Each tape can hold 100GB of data. Installed January 2002.

Netvault Software from BakBoneis used, running on morpheus, forbackup of both cdf and particle physics systems.

Page 15: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Second round of cdf JIF tender: Dell Cluster - MATRIX10 Dual 2.4GHz P4 Xeon servers running Fermi linux 7.3.1 and SCALI cluster software. Installed December 2002

Page 16: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Approx 7.5 TB for SCSI RAID 5 disks

are attached to the master node.

Each shelf holds 14 146GB disks.

These are shared via NFS with the worker nodes.

OpenPBS batch queuing software is used.

Page 17: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Plenty of space in the second rack for expansion of the cluster.

Page 18: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Lhcb Monte Carlo Setup

8 way 700MHz Xeon Server

RH6.2OpenAFSOpenPBS

gridRH6.2Globus1.1.3OpenAFSOpenPBS

Compute Node

Grid Gateway

The 8 way SMP has now been reloaded as a MS Windows Terminal Server and lhcb MC jobs will be run on the new pp farm.

Page 19: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Problems

• IDE Raid proved to be unreliable, caused lots of down time.

• Problems with NAT (using iptables caused NFS problems and hangs) Solved by dropping NAT and using real IP addresses for PP farm

• Trouble with ext3 journal errors.

• Hackers…

Page 20: Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Problems

• Lack of Manpower!• Number of Operating systems slowly reducing, Digital

unix and vms very nearly gone. NT4 also practically eliminated.

• Getting closer to standardising on RH 7.3 especially as the EDG software is now heading that way.

• Still finding it very hard to support laptops but now have a standard clone and recommend IBM laptops.

• Would be good to have more time to concentrate on security…. (See later talk)