OptiPortal Configuration Considerations
description
Transcript of OptiPortal Configuration Considerations
![Page 1: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/1.jpg)
OptiPortal Configuration Considerations
Ashley WrightHigh Performance Computing and Research Support
(QUT)
![Page 2: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/2.jpg)
Our OptiPortal
![Page 3: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/3.jpg)
Our Optiportal
6x Dell Precision T3500 Intel Xeon E5520 (2.27GHz) 4GB RAM nVidia FX 1800 Onboard 1Gb/s network PCIe 1Gb/s network card (supports Jumbo Frames) 300GB HDD
22x Dell 24” Monitors (4x5 configuration)
![Page 4: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/4.jpg)
Considerations
Wish to be able to keep the cluster in a known state. To be able to recover quickly when something goes
wrong. Need to be able to install applications fast. Compile code on the OptiPortal. Fast. Easy to use.
![Page 5: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/5.jpg)
ROCKS with Viz Roll
Fairly easy to install. Used initially to test OptiPortal and software which
can run on a Vis Wall. Software was out of date
(CentOS 5 vs Fedora 12) Difficult to customise. Difficult to install our own software.
![Page 6: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/6.jpg)
Similarities to HPC clusters.
Lots of applications. Each node of the cluster is identical. Need performance. Need to minimise downtime.
![Page 7: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/7.jpg)
HPC Cluster
Network boot and install. Shared file system across nodes. Nodes are generally identical. Multiple networks for different uses
(ie management vs MPI)
![Page 8: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/8.jpg)
Installing nodes
Network boot and auto install scripts, make reinstalling easy.
Fedora 11 & 12 used. Cobbler (https://fedorahosted.org/cobbler/)
HTTP/PXE/TFTP DHCP/DNS Yum mirror Also customisation of the install process.
![Page 9: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/9.jpg)
Installing nodes - cobbler
#install nvidia driver
pushd /root/
wget http://$http_server/files/NVIDIA-Linux-x86_64-190.53-pkg2.run -O /root/NVIDIA-Linux-x86_64-190.53-pkg2.run
chmod +x /root/NVIDIA-Linux-x86_64-190.53-pkg2.run
wget http://$http_server/files/nvidia-install.sh -O /etc/init.d/nvidia-install.sh
chmod +x /etc/init.d/nvidia-install.sh
chkconfig --add nvidia-install.sh
chkconfig nvidia-install.sh on
![Page 10: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/10.jpg)
File Server
Hosts non-volatile, shared home directories (/home), software directories (/pkg), and fedora mirror. Built with an old Dell 2900 Server:
6x1.5TB HDD (RAID 0+1). 4x 1Gb/s aggregate network. 250MB/s throughput.
![Page 11: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/11.jpg)
Keeping nodes in 'sync'
When you change something on one node you want it the same on the other nodes.
Having a shared home and application directory makes this easy.
Puppet to manage files in /etc (http://www.puppetlabs.com/)
Automated configuration management. Makes sure files and services are in a known state.
If they are not puppet fixes them. Updates every 30mins (default).
![Page 12: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/12.jpg)
Nodes in 'sync' - Puppetclass sshd {
file { "/etc/ssh/sshd_config":
owner => root,
group => root,
mode => 600,
ensure => present,
source => "puppet:///files/ssh/sshd_config"
}
exec { "/etc/init.d/sshd reload":
subscribe => File["/etc/ssh/sshd_config"],
refreshonly => true,
}
service { "sshd":
status => "/etc/init.d/sshd status",
ensure => running,
}
}
![Page 13: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/13.jpg)
Network
One network for management (dns/dhcp). Onboard network, can network boot.
One network for Internet. PCIe network card, can jumbo frame.
Internet network outside QUT firewall.
![Page 14: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/14.jpg)
Performance
Aim to render 10-25 frames per sec. 9600x4800 pixels = 175MB/frame. Bottlenecks everywhere, mostly I/O (bus, disk and
network). 1x PCIe (Gen 2) = 500MB/s 1Gb/s network = 120MB/s 1.5TB hard disk = 150MB/s (maximum)
![Page 15: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/15.jpg)
Performance - Disk
First file server. Open Solaris + ZFS RAID5z (across 6 disks) ZFS makes all reads random seeks <100 MB/s read performance Single 1Gb/s network.
![Page 16: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/16.jpg)
Performance - Disk
Second Server Fedora 12. SW RAID 0 (3 pairs) across HW RAID 1 (2 disks). Reads mostly sequential. 250 MB/s read performance. 4x 1Gb/s network.
![Page 17: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/17.jpg)
Performance - Compression
Compressing data files reduces disk I/O. CPU time to decompress negligible. Better use of I/O cache. Decompress straight to memory. Can get you over the line.
(2x-5x improvement)
![Page 18: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/18.jpg)
Issues
SSH and puppet security keys change on rebuild. Upgrading major OS versions still a lot of work. More RAM in file server (IO Cache). 1 Gb/s is not enough (at times). Need to remember to add changes to build scripts.
![Page 19: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/19.jpg)
Issues - Multiple Networks
Some software does not like multiple networks. Looks up hostname and will only use that IP
address. Should be able to overwrite in a config file.
![Page 20: OptiPortal Configuration Considerations](https://reader036.fdocuments.us/reader036/viewer/2022062408/56813ff7550346895dab1e2c/html5/thumbnails/20.jpg)
Questions?