Computer Hardware and Procurement at CERN Helge Meinhard (at) cern ch HEPiX fall 2005 @ SLAC.
-
date post
21-Dec-2015 -
Category
Documents
-
view
223 -
download
1
Transcript of Computer Hardware and Procurement at CERN Helge Meinhard (at) cern ch HEPiX fall 2005 @ SLAC.
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 2
Outline
Procedures Hardware (being) procured Power measurements Observations
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 4
Constraints (1)
CERN is an international organisation with strict administrative rules Competitive tendering required covering (at
least) member states No way to avoid for commodity equipment
Lowest compliant bid wins No negotiations about added value of higher offers
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 5
Constraints (2)
Different procedures depending on expected volume < 10’000 CHF: IT seeks 3 offers < 200’000 CHF: Formal price enquiry by purchasing
service. Four weeks response time < 750’000 CHF: Formal call for tender preceded by
market survey. Six weeks response time > 750’000 CHF: As < 750’000 CHF, plus approval by
CERN’s Finance Committee (5 sessions/year, papers ready two months in advance)
(1 CHF = 0.78 USD = 0.65 EUR)
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 6
Our problems
Procedures badly adapted to quickly evolving computing market
Difficult to give preference to “good”, reliable equipment
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 7
Our choices (1)
For significant purchases (> 100 kCHF) we require (a) sample system(s) with the tender for big tenders on CERN’s request for small tenders
Tenders include 3 years on-site warranty for hardware Typical requirements:
4 working hours response / 12 working hours repair for critical machines
3 working days response / 5 working days repair for farm nodes
Supplier can subcontract on-site warranty
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 8
Our choices (2)
Payment within 30 days after provisional acceptance on receipt of bank guarantee of 5% of purchase sum valid until end of warranty period
Delivery within 6 weeks, penalty for late delivery: 2% of purchase sum per complete week, max. 10%
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 9
Our choices (3)
If more than 10% systems fail during acceptance or during first month after: right to return the whole batch
If a system fails 3 or more times during any 6 months’ period, right to request complete replacement of system
If more than 20% of any component fail during any 6 months’ period, right to request complete replacement of this component across batch
If CERN adds third-party devices, no impact on warranty obligations for system as delivered
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 10
Our choices (4)
If justified by volume, procure from two suppliers (lowest and second-lowest compliant) Better protection if one delivers crap or
nothing at all Better chance for companies to win an order Increased workload on our part
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 11
Example of a procurement
Procurement of equipment worth < 750 kCHF Approval by Finance Committee not needed
Market survey already done Market survey can cover different types of
equipment Valid for 1 year If not done yet, add ~ 16 weeks
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 12
Steps (1)
Fix scope 2 w Write technical, commercial docs 3 w IT-internal review Revise technical, commercial docs 2 w Specification meeting Revise technical, commercial docs 1 w Tender out Deadline for replies 6 w Opening of replies 1 w(Total so far: 15 weeks, at best compressible to 12 weeks)
Typi
cal c
ase
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 13
Steps (2)
(Total from previous slide: 15 w, min. 12 w) Technical analysis of replies 1 w Visual inspection, mounting 1 w Benchmarks, reports 3 w Technical clarifications 1 w Purchase request, order 2 w Delivery 7 w Preliminary acceptance 6 wTotal: 36 weeks, compressible to 30 weeks
Typi
cal c
ase
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 15
Objectives
Cover existing needs with as few different models and as few procurement procedures as possible
Closely follow technology and market evolution and satisfy requirements with modern hardware at low cost
contradiction
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 16
Fabric Infrastructure and Operations (1)
RedHat 7.3 phased out on public services Campaign on storage nodes far advanced
New in machine room since Karlsruhe: 200 farm PCs (dual Nocona): in production 116 disk servers (> 5 TB usable each, total of 900 TB
gross capacity): part in production, part under acceptance test
112 “midrange servers”: under acceptance test 32-node Infiniband-based cluster for Theory
Refurbishment of machine room proceeding LHS being populated, but power remains limited
Talk
From CERN site report 2005/10/11
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 17
Hardware being procured (1)
Large volumes – several times < 750 kCHF per year “Farm PCs” – non-redundant, cheap dual-
processor work horses “Disk servers” – storage-in-a-box systems
with many SATA disks for streaming applications
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 18
Hardware being procured (2)
Medium-size volumes – once < 750 kCHF per year or once or several times < 200 kCHF per year “Midrange servers” – redundant building blocks for specific
applications “Tape servers” – midrange servers with an FC interface “Disk arrays” – autonomous RAID units with FC uplinks SAN infrastructure (most notably FC switches) Head nodes for serial console infrastructure “Small disk servers”, somewhere between disk servers and
midrange servers Miscellaneous
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 19
Specifications: Farm PCs (1)
2 boxed Intel Noconas of 2.8 GHz Mainboard:
BMC (IPMI 1.5 or higher) PXE, USB boot BBS menu Console redirection Configurable to stay off on AC power loss
2 GB ECC memory From mainboard manuf. approved list Upgradable to 4 GB without removing modules
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 20
Specifications: Farm PCs (2)
1 disk > 140 GB, IDE not permitted Certified for 24/7, 3 y warranty by disk manuf.
1 GigE providing PXE and IPMI access 19” chassis max. 4 U, with rails
Power, reset button Power, disk activity LED
Power supply supporting machine + 50 W Active PFC C13 to C14 LSZH power cord
Guaranteed to run under RHEL 3 (i386 and x86_64) Delivery within 6 weeks from dispatch of order
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 21
Specifications: Disk server (1)
1 or 2 boxed Intel Xeon with EM64T Mainboard as for Farm PCs
Now adding support for memory mirroring Memory as for Farm PCs General requirements for disks etc.
≥ 7200 rpm, no EIDE, 3 y warranty, certified for 24/7 by manufacturer
Metallic hot-swap trays certified by chassis manuf. Indicators for power and activity for each tray PCB backplanes for disks, multilane cabling “Intelligent” RAID controllers
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 22
Specifications: Disk server (2)
System disks: 2 x ≥ 140 GB mirrored Data disks: all identical
Redundant RAIDs with hot spares (min. 1/15) Total usable capacity per system above 5 TB Battery buffer if controller with active cache
1 GigE providing required performance, PXE, IPMI access
19” chassis rack-mountable with rails Min. 40 TB usable in 42 U high rack
Power supply: N+1 redundant, active PFC Guaranteed to run under RHEL 3 (i386 and x86_64) Delivery within 6 weeks from dispatch of order
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 23
Specifications: Disk server (3)
Performance: memory to disk: iozone with 16 GB files and 256 kb record size Single stream: 40 MB/s write, 40 MB/s read Multi-stream (at least 10): 115 MB/s write, 170 MB/s
read (*) Memory to network: iperf
Single stream: 100 MB/s write, 100 MB/s read Two streams: 110 MB/s write, 110 MB/s read Two streams in, two streams out: 145 MB/s
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 24
Specifications: Disk server (4)
Global (disk to network) performance: At least 10 clients transferring 2 GB files via rfio Reading from system: 95 MB/s (*) Writing to system: 90 MB/s (*)
(*): Requirements scale linearly with usable capacity, numbers for 5000 GB usable
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 26
Power measurements
Switched
off Booting (peak)
Idle Full load
SpecInt2k SI2k/watt (full load)
W VA W VA W VA W VA
Xeon 2x 2.8Ghz 9 18 150 150 110 112 225 230 1950 8.67
Nocona 2x 3.2Ghz (IBM Eserver xSeries 336)
47 70 250 283 173 199 266 292 3030 11.39
Opteron 2x 246 (IBM Eserver 326)
10 33 164 180 154 170 184 201 2316 12.59
Opteron dual-core 2x2x 265 9 26 175 190 163 180 208 228 5454 26.22
Estimated Pentium D system with 1 CPUs 2 cores
154 244 2800 11.48
http://ahorvath.home.cern.ch/ahorvath/power
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 28
Observations (1)
Profile of winning companies Tier-1 suppliers competing with large
integrators Small ‘round the corner companies eliminated
at Market Survey stage Almost always the integrators win
Specially tailored solutions responding to our specifications
Prices of Tier-1s rather high in Europe
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 29
Observations (2)
Stress test as (important) part of the acceptance test Introduced ~ 2 years ago (triggered by presentations
from SLAC and FNAL at HEPiX) Very useful Based on va-ctcs
No longer sufficiently actively maintained Large number of false positives Looking for a replacement
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 30
Observations (3)
Pushing these procedures through requires dedicated (and knowledgeable) person power
Not obvious to run multiple procedures in parallel In particular, if things go wrong, e.g. stress
test fails
Helge Meinhard (at) cern.chHEPiX@SLAC: Hardware procurement at CERN 31
Summary
Computer hardware procurement is an excellent experimental confirmation of two fundamental laws of human nature Murphy: “Everything that can go wrong will go
wrong.” Hoffstaedter: “Things always take longer than
you think, even if you take into account Hoffstaedter’s law.”