Frank Porter System Manager: Juan Barayogahep.caltech.edu/~babar/doe2004/compSlides.pdf · 2004. 7....
Transcript of Frank Porter System Manager: Juan Barayogahep.caltech.edu/~babar/doe2004/compSlides.pdf · 2004. 7....
Experimental Computing
Frank Porter
System Manager: Juan Barayoga
1 Frank Porter, Caltech DoE Review, July 21, 2004
2 Frank Porter, Caltech DoE Review, July 21, 2004
HEP Experimental Computing System Description (I)
CPU farm— Linux on dual-Intel-CPU rack-mounted units, 1-2 GByte memory
each— Currently 122 CPUs in farm— “PBS” batch system for resource allocation— 100BaseT network connection to each unit— KVM switches for local keyboard/mouse/videoFile servers— Also Linux/Intel-based— IDE RAID 5 technology— 7.9 TByte capacity on five servers— Gbit ethernet— NFS, AFS, Samba file serving software
3 Frank Porter, Caltech DoE Review, July 21, 2004
HEP Experimental Computing System Description (II)
Interactive servers— Desktop mixture of linux and Windows on Intel— Central interactive linux servers (4 dual-CPU), legacy AIX servers— Recent purchase of eight high performance desktops (including lo-
cal RAID 0 serial ATA) for heavy interactive analysisOther CPU services— NT domain servers (primary and secondary)— Web servers (Linux and NT)— Objectivity (for BaBar)Tape drives— DLT tape library on fileserver— DLT drive on NT— Mostly used for backups now
4 Frank Porter, Caltech DoE Review, July 21, 2004
HEP Experimental Computing System Description (III)
Network— 100BaseT capability available everywhere, 2 subnets for security,
capacity— HEP gigabit ethernet switches, plus CITNET 2000— Wireless 11 Mbps, maintained by Caltech— WAN supported by CaltechPrinters (principally 2 color and 2 B&W)VGA projectors (conference rooms, plus roamer)UPS for critical services (network, file servers, mail server, web server)
5 Frank Porter, Caltech DoE Review, July 21, 2004
6 Frank Porter, Caltech DoE Review, July 21, 2004
Caltech is a major site for BaBar Monte Carlo ProductionTwo jobs on each of 40 dual-PIII nodes; Four jobs (hyperthreading
mode) on each of 20 dual Xeon nodes. Each job has 512 MB RAM
available.Allocations– Signal modes (large variety)– Generics: B0B̄0, B+B−, cc̄, uds, τ+τ−, µ+µ−
Alex Samuel runs MC production at Caltech– Checks 1–2 times/day– Request new allocation every 2–3 weeks– Consecutive allocations run overlapped, so no dead time except
when drain queue to upgrade conditions, background triggers, or
softwareCurrently, third most productive BaBar site for SP6.
7 Frank Porter, Caltech DoE Review, July 21, 2004
Caltech BaBar Monte Carlo Production
8 Frank Porter, Caltech DoE Review, July 21, 2004
Caltech BaBar Monte Carlo Production, Weekly StatsStats from BaBar Monte Carlo production sites in past week (July 16, 2004):
Total events produced in SP6: 54.12 M
Site Runs Done
Runs Failed
Failure Rate (%)
Events (M)
Machines Events/Machine Cpu Eff. (%)
Site eff. (%)
uvic2 3305 7 0.2114 7.248 78 0.0929 73.7 127.7
utd 2231 134 5.6660 7.018 62 0.1132 96.7 76.7
caltech 3447 18 0.5195 6.428 56 0.1148 83.7 127.2
osu 2358 8 0.3381 4.196 27 0.1554 97.2 91.9
cu-boulder
2072 34 1.6144 4.087 87 0.0470 95.1 39.8
albany 1037 1594 60.5853 4.028 32 0.1259 95.0 97.6
tud 2293 1589 40.9325 4.016 33 0.1217 79.6 130.5
ccin2p3 228 6 2.5641 3.608 372 0.0097 67.0 11.2
utenn 1431 97 6.3482 3.456 40 0.0864 98.4 90.5
infn 1867 165 8.1201 3.22 39 0.0826 71.9 117.0
fzk 1175 297 20.1766 2.762 343 0.0081 80.2 4.6
uk-spgrid 947 10 1.0449 2.196 34 0.0646 79.5 79.4
uvic 376 127 25.2485 1.144 20 0.0572 90.3 84.9
slac 258 0 0.0000 .432 32 0.0135 97.0 7.4
westgrid 738 1 0.1353 .148 71 0.0021 79.0 1.8
uofl 0 3 100.0000 .133 12 0.0111 91.7 5.8
9 Frank Porter, Caltech DoE Review, July 21, 2004
BaBar Physics Analysis at Caltech
Use of Caltech computing for BaBar data analysis is increasing
– Off-loads SLAC computing
– Using both batch queues and interactive analysis
Set up for full CM2-based user analysis
– Have new releases
– Can import datasets from SLAC
– Can compile/debug/run user code; building and running is gener-
ally faster than running at SLAC
Used for large (100’s of GB) physics dataset storage (eg, ntuples) and
analysis
10 Frank Porter, Caltech DoE Review, July 21, 2004
MINOS Looming Large
Pre-data MINOS uses farm for occasional Monte Carlo runs, some
analysis.
Beam expected to start in December. Computing model remains
somewhat unclear relative to Caltech, but likely will mean something
like:
– Some reprocessing of data
– Monte Carlo production, possibly substantial
– Physics analysis, requiring efficient interactive access to data
A worry: FNAL is not going with Red Hat Enterprise Linux.
11 Frank Porter, Caltech DoE Review, July 21, 2004
Continuous Evolution
Farm continues to grow to match needs.
— In 2004 installed additional 20 dual-CPU machines with 2.8 GHzIntel Xeon processors. Currently configuring order for another 20.Haven’t hit any scaling limit yet.
— Blade servers: opted against so far, will continue to watch.
— Rack space, power/heat load issues. Recently added electrical ca-pacity, will have to do more.
— Cost/unit approximately constant, performance/unit increases.
Disk space continues to grow to match needs.
— Cost/unit approximately constant, performance/unit increases.
Replaced DQS batch system with PBS.
No remaining reliance on AIX for services.
12 Frank Porter, Caltech DoE Review, July 21, 2004
Continuous Evolution (II)
Disk space and network replacing tapes; tapes required mostly for
backups.
Recently upgraded UPS; still need a bit more.
Additional switches (Cisco 2948G, 2970) recently purchased, currently
have four
13 Frank Porter, Caltech DoE Review, July 21, 2004
Caltech Support
Caltech’s ITS (Information Technology Services) provides variety of ser-
vices benefitting HEPSite-wide software license agreements— These agreements have improved considerably over time, and are
now quite flexible in permitting desired uses.— Autocad, Visio, Pro Engineer— Maple, Mathematica, Matlab— Microsoft: OS, Office, Visual Studio, Project— Norton antivirus (NAV)— PCTeX— SSH, WinSCP— Adobe Acrobat— New: Red Hat Enterprise Linux
14 Frank Porter, Caltech DoE Review, July 21, 2004
Caltech Support – Networking
Networking support
— Campus backbone, and equipment/maintenance for WAN connec-
tion provided by Caltech
— Caltech provides ISP (cable modem, PPP, ISDN) service
— Caltech monitors security alerts
— Major Caltech campus-wide network upgrade
— Uniform wireless 802.11b (partial) implementation
15 Frank Porter, Caltech DoE Review, July 21, 2004
Operations
New farm and CPU-servers were brought up with help from BaBar,
and largely restricted to BaBar as “guinea pigs.” Now supporting all
of the groups.
However, use continues to be dominated by BaBar with heavy Monte
Carlo and analysis demands.
Decision to go uniformly with Linux was largely the result of personnel
concerns. [Phase-out of AIX is essentially complete.]
Juan Barayoga is system manager.
Additional help from part-time students, physicists.
16 Frank Porter, Caltech DoE Review, July 21, 2004
Comments on Caltech HEP computing budget
Computing requirements are increasing rapidly, at same time power/dollarincreases.
Operating budget request is made up of two pieces;
— Maintenance and system administration, almost entirely salary.
— Equipment budget. Actually supersedes most former “mainte-nance” expense, with continuous “upgrade” – replacement moreeffective than repair.
Proposal is to maintain current level of funding.
17 Frank Porter, Caltech DoE Review, July 21, 2004