Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.
-
Upload
cathleen-reynolds -
Category
Documents
-
view
222 -
download
6
Transcript of Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.
![Page 2: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/2.jpg)
Fabric Area Overview
InfrastructureElectricity, Cooling, SpaceInfrastructureElectricity, Cooling, Space
NetworkNetwork
Batch system (LSF, CPU server)Batch system (LSF, CPU server)
Storage system (AFS, CASTOR, disk server)Storage system (AFS, CASTOR, disk server)
Purchase, Hardware selection,Resource planningPurchase, Hardware selection,Resource planning
InstallationConfiguration + monitoringFault tolerance
InstallationConfiguration + monitoringFault tolerance
Prototype, TestbedsPrototype, Testbeds
Benchmarks, R&D,ArchitectureBenchmarks, R&D,Architecture
Automation, Operation, ControlAutomation, Operation, Control
Coupling of components through hardware and software
GRID services !?GRID services !?
![Page 3: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/3.jpg)
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
![Page 4: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/4.jpg)
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
![Page 5: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/5.jpg)
Building Fabric — I B513 was constructed in the early 1970s and
the machine room infrastructure has evolved slowly over time.– Like the eye, the result is often not ideal…
![Page 6: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/6.jpg)
Current Machine Room LayoutProblem:Normabarres run one way, services run the other….
Services
Services
Services
Services
![Page 7: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/7.jpg)
Building Fabric — I B513 was constructed in the early 1970s and
the machine room infrastructure has evolved slowly over time.– Like the eye, the result is often not ideal…
With the preparations for LHC we have the opportunity to remodel the infrastructure.
![Page 8: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/8.jpg)
528 box PCs 105kW1440 1U PCs 288kW324 disk servers 120kW(?)
Future Machine Room Layout
18m double rows of racks12 shelf unitsor 36 19” racks
9m double rows of racks for critical servers
Aligned normabarres
![Page 9: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/9.jpg)
Building Fabric — I B513 was constructed in the early 1970s and
the machine room infrastructure has evolved slowly over time.– Like the eye, the result is often not ideal…
With the preparations for LHC we have the opportunity to remodel the infrastructure.– Arrange services in clear groupings associated with
power and network connections.» Clarity for general operations plus ease of service restart
should there be any power failure.
– Isolate critical infrastructure such as networking, mail and home directory services.
– Clear monitoring of planned power distribution system.
Just “good housekeeping”, but we expect to reap the benefits during LHC operation.
![Page 10: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/10.jpg)
Building Fabric — II Beyond good housekeeping, though, there are
building fabric issues that are intimately related with recurrent equipment purchase.– Raw power: We can support a maximum equipment
load of 2.5MW. Does the recurrent additional cost of blade systems avoid investment in additional power capacity?
– Power efficiency: Early PCs had power factors of ~0.7 and generated high levels of 3rd harmonics. Fortunately, we now see power factors of 0.95 or better, avoiding the need to install filters in the PDUs. Will this continue?
– Many sites need to install 1U or 2U rack mounted systems for space reasons. This is not a concern for us at present but may become so eventually.
» There is a link here to the previous point: the small power supplies for 1U systems often have poor power factors.
![Page 11: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/11.jpg)
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
![Page 12: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/12.jpg)
Fabric ArchitectureLevel of complexity
Batch system, load balancing,Control software, Hierarchical Storage Systems
HardwareHardware SoftwareSoftware
CPUCPU
Physical and logical couplingPhysical and logical coupling
DiskDisk
PC PC Storage tray,NAS server,SAN element
Storage tray,NAS server,SAN element
Motherboard, backplane,Bus, integrating devices(memory,Power supply, controller,..)
Operating system, driver
Network (Ethernet, fibre channel, Myrinet, ….)Hubs, switches, routers
ClusterCluster
World wide clusterWorld wide cluster Grid middleware Wide area network
![Page 15: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/15.jpg)
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics– The batch scheduler– Chip technology– Processors/box– The operating system– Others?
![Page 16: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/16.jpg)
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics
» Not much we in IT can do here!
– The batch scheduler– Chip technology– Processors/box– The operating system– Others?
![Page 17: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/17.jpg)
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics – The batch scheduler
» LSF is pretty good here, fortunately.
– Chip technology– Processors/box– The operating system– Others?
![Page 18: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/18.jpg)
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics – The batch scheduler– Chip technology
» Take hyperthreading, for example. Tests have shown that, for HEP codes at least, hyperthreading wastes 20% of the system performance running two tasks on a dual processor machine. There are no clear benefits to running with hyperthreading enabled when running three tasks. What is the outlook here?
– Processors/box– The operating system– Others?
![Page 19: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/19.jpg)
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics – The batch scheduler– Chip technology– Processors/box
» At present, a single 100baseT NIC would support the I/O load of a quad processor CPU server. Quad processor boxes would halve the cost of networking infrastructure—but they come at a hefty price premium (XEON MP vs XEON DP, heftier chassis, …). What is the outlook here?
And total system memory becomes an issue.
– The operating system– Others?
![Page 20: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/20.jpg)
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics – The batch scheduler– Chip technology– Processors/box– The operating system
» Linux is getting better, but things such as processor affinity would be nice.
Relationship to hyperthreading…
– Others?
![Page 21: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/21.jpg)
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics – The batch scheduler– Chip technology– Processors/box– The operating system– Others?
![Page 22: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/22.jpg)
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
![Page 23: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/23.jpg)
Storage subsystem
Processors “desktop+” node == CPU server
CPU server + larger case + 6*2 disks == Disk server
CPU server + Fiber Channel Interface + tape drive == Tape server
Simple building blocks:
![Page 26: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/26.jpg)
Storage subsystem — Disk Storage TCO: Maximise available online capacity within
fixed budget (material & personnel).– IDE based disk servers are much cheaper than high
end SAN servers. But are we spending too much time on maintenance?
» Yes, at present, but we need to analyse carefully the reasons for the current load.
Complexities of Linux drivers seem under control, but numbers have exploded. And are some problems related to batch of hardware?
– Where is the optimum? Switching to fibre channel disks would reduce capacity by factor of ~5.
» Naively, buy, say, 10% extra systems to cover failures. Sadly, this is not as simple as for CPU servers; active data on the servers must be reloaded elsewhere.
» Always have duplicate data? => purchase 2x required space. Still cheaper than SAN? How does this relate to …
![Page 27: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/27.jpg)
Storage System — Tapes The first TCO question is “Do we need them?” Disk storage costs are dropping…
![Page 28: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/28.jpg)
Disk Price/Performance Evolution
price in SFr per GByte
1
10
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
time since Jan 2000
SF
r/G
B
40 GB disk
60 GB disk
80 GB disk
120 GB
160 GB
180 GB
200 GB
disk server
factor 6 in 3 years
factor 2.5 difference
Non-mirrored disk server
![Page 29: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/29.jpg)
Storage System — Tapes The first TCO question is “Do we need them?” Disk storage costs dropping… But
– Disk servers need system administrators, idle tapes sitting in a tape silo don’t.
– With disk only solution, we need storage for at least twice the total data volume to ensure no data loss.
– Server lifetime of 3-5 years; data must be copied periodically.
» Also an issue for tape, but the lifetime of a disk server is probably still less than the lifetime of a given tape media format.
Assumption today is that tape storage will be required.
![Page 32: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/32.jpg)
Storage System — Tapes Tape robotics is easy.
– Bigger means better cost/slot.
Tape drives: High end vs LTO– TCO issue: LTO drives are cheaper than high end IBM
and STK drives, but are they reliable enough for our use?
» c.f. the IDE disk server area.
Real problem, though is tape media.– Vast portion of the data is accessed rarely but must
be stored for long period. Strong pressure to select a solution that minimises an overall cost dominated by tape media.
![Page 33: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/33.jpg)
Storage System — Managed Storage Should CERN build or buy software systems? How to measure the value of a software system?
– Initial cost:» Build: Staff time to create required functionality» Buy: Initial purchase cost of system as delivered plus staff time
to install and figure for CERN.
– Ongoing cost» Build: Staff time to maintain system and add extra functionality» Buy: License/maintenance cost plus staff time to track releases.
Extra functionality that we consider useful may or may not arrive.
Choice:– Batch system: Buy LSF.– Managed storage system: Build CASTOR.
Use this model as we move on to consider system management software.
![Page 34: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/34.jpg)
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
![Page 35: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/35.jpg)
Installation and Configuration Reproducibility and guaranteed homogeneity of
system configuration is a clear method to minimise ongoing system management costs. A management framework is required that can cope with the numbers of systems we expect.
We faced the same issues as we moved from mainframes to RISC systems. Vendor solutions offered then were linked to hardware—so we developed our own solution.
Is a vendor framework acceptable if we have a homogeneous park of Linux systems?– Being honest, why have we built our own again?
![Page 36: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/36.jpg)
Installation and Configuration Installation and configuration is only part of the
overall computer centre management:
![Page 37: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/37.jpg)
ELFms architecture
NodeConfiguration
SystemMonitoring
System
InstallationSystem
Fault MgmtSystem
![Page 38: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/38.jpg)
Installation and Configuration Installation and configuration is only part of the
overall computer centre management: Systems provided by vendors cannot (yet) be
integrated into such an overall framework. And there is still a tendency to differentiate
products on the basis of management software, not raw hardware performance.– This is a problem for us as we cannot ensure we
always buy brand X rack mounted servers or blade systems.
– In short, life is not so different from the RISC system era.
![Page 39: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/39.jpg)
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
![Page 40: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/40.jpg)
Monitoring and Control Assuming that there are clear interfaces, why
not integrate a commercial monitoring package into our overall architecture?
Two reasons:– No commercial package meets (met) our
requirements in terms of, say, long term data storage and access for analysis.
» This could be considered self serving: we produce requirements that justify a build rather than buy decision.
– Experience has show, repeatedly, that monitoring frameworks require effort to install and maintain, but don’t deliver the sensors we require.
» Vendors haven’t heard of LSF, let alone AFS.» A good reason!
![Page 41: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/41.jpg)
Hardware Management System A specific example of the
integration problem. Workflows must interface to local procedures for, e.g., LAN address allocation. Can we integrate a vendor solution? Do complete solutions exist?
Request New Machine Install [FIO/IS] Decide New Identity [FIO/OPT]
Install [FIO/IS]
Request Physical Machine Install [FIO/OPT]Physically Install Machine [DCS]
Connect to Network [CS]
Check and Update Information [FIO/OPT]
Request Network Connection [FIO/OPT]
Remedy/HMSFIO/OPT
Import Node Map
FIO/IS
Raise Ticket
Retire Node
DCS
Raise Ticket
Move Machine
Perform db updates & checks
Raise Ticket
Install S/W & put in prod'n
Close Ticket
Remedy/PRMS
Observe
Change Status
Remedy/DCS
Observe
Close Ticket
Change Status
Observe
Close Ticket
Close Ticket
CS
Change Status
Req. n/w conn & dns entry
Update CS DB & DNS
Observe
Confirmation email
![Page 43: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/43.jpg)
We will do better:
TCO issue: Do the benefits of a single console management system outweigh costs of developing our own? How do we integrate vendor supplied racks of preinstalled systems?
Console Management
xxx
pcitfionnn
lxplusnnn
Userapp
CDB – config service
• Machine – port @ head node mapping
• User – machine authorisations
Console server 1
Serverproc
conf
log
Machine 1.1
Machine 1.44
.
.
.
.
RS/232
Console server 75
Serverproc
conf
log
Machine 75.1
Machine 75.44
.
.
.
.
…
Console logrepository
xxx
pcitfionnn
lxplusnnn
Userapp
lxplusnnn
Userapp
CDB – config service
• Machine – port @ head node mapping
• User – machine authorisations
Console server 1
Serverproc
conf
log
Console server 1
Serverproc
conf
log
Machine 1.1
Machine 1.44
.
.
.
.
RS/232
Console server 75
Serverproc
conf
log
Console server 75
Serverproc
conf
log
Machine 75.1
Machine 75.44
.
.
.
.
…
Console logrepository
![Page 44: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/44.jpg)
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
![Page 45: Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e8a5503460f94b8f4ce/html5/thumbnails/45.jpg)
Hardware Purchase The issue at hand: How do we work within our
purchasing procedures to purchase equipment that minimises our total cost of ownership?
At present, we eliminate vast areas of the multi-dimensional space by assuming we will rely on ELFms for system management and Castor for data management. Simplified[!!!] view:– CPU: White box vs 1U vs blades; install or ready
packaged– Disk: IDE vs SAN; level of vendor integration
HELP! Can we benefit from management software that
comes with ready built racks of equipment in a multi-vendor environment?