TRIUMF SITE REPORT Corrie Kost
description
Transcript of TRIUMF SITE REPORT Corrie Kost
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
TRIUMF SITE REPORT
Corrie Kost
LINUX at TRIUMF
Use ISO CD’s
Kickstart
Available
Auto
Updates
RH9 ServersDesktop
Yes Yes Yes(only errata / no new hardware)
Fedora
Core 1
Leading Desktop,Special Needs Servers
Yes Yes YesErrata – 18months
Scientific
Linux
Desktop.Future Servers & Desktops – Support !
Yes Yes Yes36 months for hardware, 60 months for errata by RH
TRIUMF urges proper support for Scientific Linux
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
WANReplacement MRV units (10Gb/sec capable)
Third Passport Router
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
WestGrid – UBC/TRIUMF Site
• 504 dual 3.06 GHz Xeon IBM blades• Red Hat Linux 9 to allow GPFS (NFS nixed)• OPENPBS Scheduling with (MOAB) Maui• 10 TB disk storage• 70 TB tape storage• Direct Gigabit connection between sites• Possible 10GB in future• February 2004 – opened for general use.
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
WestGrid – UBC/TRIUMF Site(www.westgrid.ca)
•From a cold start :•GPFS servers load in 5-10min•All nodes up on 60-90min
•Bring up single nodes – 10min•Rebuild (disk) for node – 2 hrs•Single node failure rate ~ 1/day•Node disk failures dominate•Utilization about 87%
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Network / Servers
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
ServersUpgradeProgram
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
LCG Grid Participant
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Hardware nice but…• 40pin IDE cable is a problem with 2.6 kernel• Mounting bracket screws can short audio & halt boot
High I/OTestbed
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
STORM1 & STORM2
• Dual 3.2 GHz Xeons
•4GB memory
•4 3WARE 8506-4LP
•16 SATA150 120GB DRIVES
•20GB ST92011A DRIVE
•INTEL 10GBE PXLA8590LR
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
High Speed I/O –Part 1
Used ext2 for highest speeds (no journaling, but 2TB file size limit)
RH 9 OneFour disk (writes) software RAID 0 3-Ware Controller50.6 , 98, 124, 141 MB/sec respectively. Four disks split over two 3-Ware controllers 162 MB/sec writes Four disks on 1 hardware raid 0 and software raid 0 138MB/sec writesAdding 4 more disks on second 3-Ware – 250 MB/sec (slots 2,5)
--247 MB/sec (slots 2,3)
Adding 4 more disks on third 3-Ware -- 273 MB/sec (slots 2,3,5)
-- 265 MB/sec (slots 2,3,4)
Adding 4 more disks on fourth 3-Ware -- 283 MB/sec (slots 2,3,4,5)
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
High Speed I/O- Part 2
• Using 4 3-ware in hardware raid 0 mode , software raided by Linux
• dd if=/dev/zero of=/raid/8GB bs=81920 count=104857
• Fedora1 – non-smp – 2.4.22-1.2188np1 HT ext2 -T news write 370 MB/sec
• Fedora1 – non-smp – 2.4.22-1.2188.np1 HT reiserfs write 227 MB/sec
• Loaded e2fs module 1.35-7.1 to fix -largefile and –largefile4 creation with mkfs –T largefile /dev/md0
• Fedora1 –non-smp – 2.4.22-1.2188npt1 HT largefile ext2 write 349 MB/sec
• Fedora1 –non-smp -2.4.22-1.2188npt1 noHT largefile ext2 write 300 MB/sec
• Fedora1 –non-smp – 2.6.6#1 HT largefile ext2 write 375 MB/sec
• Replaced 40 with 80 pin ide cable to main disk allowed SMP to boot
• Fedora1 –SMP – 2.6.6#1 noHT largefile ext2 write 309 MB/sec
• echo 262144 > /proc/sys/net/core/rmem_default
• echo 8388608 > /proc/sys/net/core/rmem_max
• echo 262144 > /proc/sys/net/core/wmem_default
• echo 8388608 > /proc/sys/net/core/wmem_max
• echo 300000 > /proc/sys/net/core/netdev_max_backlog
• echo 8388608 > /proc/sys/net/core/optmem_max
• sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
• sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
• sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
• Iperf maxed out at 2.3Gbits/sec with recompiled 2.6.6 kernel for WEB100
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
High Speed I/O- Part 3
[root@storm2 root]# time ttcp -t -b 6000000 -l 102400 storm1-10g </raid/8gb-attcp-t: buflen=102400, nbuf=2048, align=16384/0, port=5001, sockbufsize=6000000 tcp -> storm1-10gttcp-t: socketttcp-t: sndbufttcp-t: connectttcp-t: 8589934592 bytes in 42.80 real seconds = 195978.14 KB/sec +++ttcp-t: 83887 I/O calls, msec/call = 0.52, calls/sec = 1959.80ttcp-t: 0.0user 22.2sys 0:42real 52% 0i+0d 0maxrss 0+25pf 17854+622csw
Ttcp disk to disk 191 Mbytes/sec
Three Walls : CPU - 100 % seen 3Ware I/O Controller (140MB/sec instead of 4*50, 375MB/sec instead of 4*140)10Gbit Intel Card using ixgb-1.0.65 driver (2.3 Gb/sec)
Ongoing: Tuning Process Affinity (using /usr/bin/run)Interrupt Affinity (IRQ of 3-ware and 10GbE set to CPU’s eg /proc/irq/24/smp_affinity)
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Misc. Developments
Build a cheap hot-swapSerial ATA drivesRaid 5 system•1 Promise Fasttrack S150 SX4 controller $233Can•3 Promise Superswap 1100 Drive Enclosures for SATA/150 $112Can•3 Maxtor 120GB S-ATA drives (6Y120M0) $145CanTest on cheap 1.7GHz Celeron, Intel D845GVSLR, 256Mb memoryRedhat 9.0 base (won’t work on updated kernels)• Read large file – 46.8 Mbytes/sec •Write large file – 46.5 Mbytes/sec•Able to pull disk while active – auto rebuilds in 75min when replaced.
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Misc. Developments
•Remote power on/off using networked power bars
www.servertech.com
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Mail at TRIUMF
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
IMP Webmail
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Squirrel Webmail
TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost