Platform OCS 5 - Queen's Universitywiki.phy.queensu.ca/hughes/images/0/03/OCS5-Day2-module1.pdf ·...

Post on 29-May-2020

1 views 0 download

Transcript of Platform OCS 5 - Queen's Universitywiki.phy.queensu.ca/hughes/images/0/03/OCS5-Day2-module1.pdf ·...

Day 2, module 1

Platform OCS 5Installer node setup and configuration

Module Objectives

Upon completion of this module, you will be able to: Install and configure an Installer Node Verify that the Installer Node is properly

configured

Pre-installation

Installing Red Hat HPC Software check Hardware check Network check Network information

Installing OCS Software check Hardware check Network check Network information

Public/CorporateNetwork

Pre-installation

We’ll spend the next few minutes covering the pre-requisites to installing a Platform OCS 5 cluster.

Building a OCS 5 or RH HPC Installer Node

Platform OCS 5 can be installed two different ways

1. From a OCS 5 CD/DVD + Red Hat Media

2. From a RHEL 5.1 machine via Red Hat Network

The preferred installation mechanism for Red Hat is via Red Hat network

All other Platform OCS partners use option 1

Option 1: Installing via Red Hat Network or Satellite

Server

Pre-installation check: RH HPC

Minimal hardware requirements for installer node 512 MB of physical memory (RAM)  40 GB disk DVD/CD drive  One Ethernet interface (Two interfaces for a Beowulf cluster)

Software Configuration Required Red Hat Enterprise Linux 5.1 Red Hat installed on the machine At least one statically configured Ethernet Interface

Pre-installation network check – RH HPC

All compute nodes should be configured for PXE boot (change the BIOS settings)

The Ethernet switches need to be configured properly spanning tree should be disabled PortFast should be enabled (if supported) Switch should be tested prior to the install

Everything connected properly No bad ports

PXE refers to the Pre-boot Execution Environment supported on most NIC cards – Kusu and Platform OCS rely on this functionality to ease node management

Pre-installation network information – RH HPC

The following network related information will required during the install process: Installer node public interface can be configured DHCP Installer node host information:

Hostname (Fully Qualified Domain Name) Static IP address Subnet mask Gateway address (to public internet) DNS server IP address(es) (optional)

Installer node private interface must be static

Private Network information: Private IP Address Subnet Mask

Don’t guess on any of these settings messing up means a re-install!

Installing RHEL 5.1

•Standard RHEL 5.1 install

•Software tested on RHEL 5.1 not RHEL 5.0

Installing RHEL 5.1•Minimum Network config:

•1 Static Defined Ethernet Interface

•Other interfaces can be static or DHCP

•1 network interface must have access to the corporate network, or internet for RHN or Satellite Server.

•Manual hostname is optional for the DHCP network interface

•Manual hostname is required for the static private interface.

•Choose a private hostname and domain that will not conflict with any names/domains in your environment.

•Network configuration mistakes are the most common problem during OCS installation

Installing RHEL 5.1

•Choose:

•Web Server

•Software Development

•RH HPC requires a web server

•Software Development while not required is useful in a HPC cluster

Installing RHEL 5.1

Disk Partitioning:

•Minimum 35 Gb disk is required

•OCS stores OS repositories which are typically 4Gb each.

•The more disk space the better

•OCS installs OS repositories in /depot by default

•/depot should have a minimum of 15Gb allocated.

•The default partitioning works with OCS

•LVM partitioning recommended for the OCS Installer Node.

RHEL 5.1 Firstboot

Text box

RHEL 5.1 Firstboot

Text box

RHEL 5.1 Firstboot

•Connect the Server to RHN

•RHN setup can be skipped if needed

•RH HPC will be configured to update from Satellite Server in our Demo

RHEL 5.1 Firstboot

•Additional users can be created if needed

•Network Login is not supported in this release of RH HPC

Firstboot – Network Check

•Minimum Network config:

•1 Static Defined Ethernet Interface

•Other interfaces can be static or DHCP

•1 network interface must have access to the corporate network, or internet for RHN or Satellite Server.

•Network configuration mistakes are the most common problem during OCS installation

RHEL 5.1 Adding GPG keys and RHN Register

Keys for this demo are dummy keys

in the real product release Red Hat keys will be used.

Platform dummy Key:

wget -q http://intel7.lsf.platform.com/pub/PLATFORM-GPG-PUB-KEY

rpm --import PLATFORM-GPG-PUB-KEY

Fedora EPL Key:

wget -q http://download.fedora.redhat.com/pub/epel/RPM-GPG-KEY-EPEL

rpm --import RPM-GPG-KEY-EPEL

Add Server to RHEL Satellite Server:

rpm -Uvh http://intel7.lsf.platform.com/pub/rhn-org-trusted-ssl-cert-1.0-1.noarch.rpm

rhnreg_ks --username admin --password <yourpass> --serverUrl https://intel7.lsf.platform.com/XMLRPC --sslCACert /usr/share/rhn/RHN-ORG-TRUSTED-SSL-CERT

Installing RH HPC on RHEL 5.1

•Once the machine is properly registered install ‘OCS’

•# yum install ocs

•50 or more packages should be downloaded and installed.

•It will take ~5-10 minutes for install

Configuring RH HPC

Source the OCS environment

# source /etc/profile.d/kusuenv.sh

Run the OCS setup script:

# /opt/kusu/sbin/ocs-setup

If the machine is configured correctly the static network interface will be used for provisioning.

The static network interface will run a DHCP server – ensure this DHCP server will not conflict with other servers.

If everything is okay…continue with the script.

Enter a private DNS domain for the cluster.

The private domain will be used for all nodes provisioned in the cluster

It is the cluster private domain - *NOT AN EXISTING DOMAIN* in your organization or the Internet

Configuring RH HPC

•Continue with the ‘ocs-setup’ script:

•Choose a location for the OS repositories and Images:

•The default is /depot

•The installation script will ask for the RHEL 5.1 media

•The media is used to build the OS repository with all RHEL packages

• Copying the OS media takes approximately 10-15 minutes

•Using ISO media is the fastest way of installing

•CD/DVD media works as well

Red Hat HPC Lab 4

For the instructor and students Now that you have seen a Install…. Install the software on a cluster

first install RHEL 5.1 configure the server to connect to the Satellite Server install RH HPC configure RH HPC

Option 2: Installing with OCS Media and RHEL 5.1

Pre-installation check: OCS

Minimal hardware requirements for installer node 512 MB of physical memory (RAM)  35 GB disk DVD drive  One Ethernet interface (Two interfaces for a Beowulf cluster)

Software Required Platform OCS 5 (or Kusu) Installation DVD The Platform OCS media contains the kits you need One OS Kit must be added during installation – make sure that

you have OS installation media (or ISOs if using VMware) on hand: Fedora Core 6 for x86 or x86_64 Centos 5.X for x86 or x86_64 Red Hat Enterprise Linux 5.x for x86 or x86_64

Pre-installation: network check

All compute nodes should be configured for PXE boot (change the BIOS settings)

The Ethernet switches need to be configured properly spanning tree should be disabled PortFast should be enabled (if supported) Switch should be tested prior to the install

Everything connected properly No bad ports

PXE refers to the Pre-boot Execution Environment supported on most NIC cards – Kusu and Platform OCS rely on this functionality to ease node management

Pre-installation network information - OCS

The following network related information will required during the install process: Installer node host information:

Hostname (Fully Qualified Domain Name) Static IP address Subnet mask Gateway address (to public internet) DNS server IP address(es)

Private Network information: Private IP Address Subnet Mask Private Network Name (can be anything)

Don’t guess on any of these settings messing up means a re-install!

After preliminary data gathering is done, and the installation host is physically connected to required networks, you are ready to begin

OCS 5 by default uses a Class ‘B’ address for the cluster network – this can be changed to meet your needs.

You do not need compute hosts available to install the installer, however it is handy to have at least one compute host to validate the installation

A step-by-step process for installing the installer node follows:

Installing the OCS 5 Installer host

Public/CorporateNetwork

One more Important Pre-requisite..

These are actually DVDs and CDs – not donuts! – since the installation procedure can take some time, you may want some donuts as well!

Building an OCS 5 Installer Node

Boot from CD Choose Language Configure Networks DNS, & Gateway

Root Password Partitioning & LVM Adding Kits Install Summary

Installing Packages Installer Node Boot

Installation complete. Installer node is ready to use.

Installation: installer – OCS 5

Insert the OCS 5 media and power up the frontend At splash screen: press enter

Installation: installer – OCS 5

The installer will guide you through the process of installing the installer node. It is designed to gather information “up front” so that (with the exception of OS kit installation) you can “walk away” after the installation has started.

Installation: installer – OCS 5

Use Tab to navigate from field to field, space bar to toggle choices active and enter to select a choice.

Installation: installer- OCS 5

Installation: installer – OCS 5

Configure eth0 (our private cluster network interface) and eth1 (the public network interface)

Installation: installer – OCS 5

Our private network should be set to the network type “provision” – the network name can be any descriptive name.

Installation: installer – OCS 5

Similarly the public interface should be set to “public”. These network names will appear in the Kusu database and will be used when generating system files such as dhcpd.conf

Installation: installer – OCS 5

Indicate the gateway IP and the IP address of one or more DNS servers.

Installation: installer – OCS 5

The private name will be used by OCS 5 to identify the private cluster domain.

Installation: installer – OCS 5

Set the timezone and provide an NTP server.

Installation: installer – OCS 5

This refers to the Red Hat Network (RHN) installation number – if you have this number you should enter it here – otherwise select skip. The RHN is used to guide the installer to select components appropriate for a users subscription – it is also used for support incidents to verify support status.

Installation: installer – OCS 5

Enter the root password for the installer that will be replicated to the compute hosts (via cfm)

Installation: installer – OCS 5

Next step is to setup our partition table – if you are unsure how to size the partitions, it is recommended that you accept the defaults

Installation: installer – OCS 5

The default partition setup places four volumes on the volume group “VolGroup00” – a small /boot partition is created at the beginning of the disk as well as a SWAP partition.

Installation: installer – OCS 5

Before proceeding with the installation, we’re asked to confirm the values that we’ve entered.

Installation: installer – OCS 5

Next we are presented with the list of default kits to install included with OCS 5 – we normally need to add an operating system kit – in our example we will add RHEL 5 as an OS kit as we’ll plan to install this on our compute nodes.

Installation: installer – OCS 5

The OS DVD (or CDs) will be copied to /depot/kits/rhel where the constituent RPMs will be used for package based installs or to create images for image based installs

Installation: installer – OCS 5

If installing from CD there may be multiple disks required.

Installation: installer – OCS 5

The initial repository will be created called “Repo for rhel 5 x86_x64” (located in /depot/repos/1000) in our example – the repository will contain symbolic links to the various packages included in the kits included in the repository definition.

Installation: installer – OCS 5

Next, Anaconda will begin installing RHEL on our installer node.

Installation: installer – OCS 5

Packages will be installed on the installer node.

Once completed, the installer node will reboot

Installation: installer – OCS 5

Installation: installer – OCS 5

The installer scripts in /etc/rc.kusu.d initialize Kusu loading the table schema for the database from /opt/kusu/sql – “mysqlrunner” is called to run generated SQL statements against the kusudb MySQL database tables reflecting the installation.

Installation: installer – OCS 5

Images for “imaged” and diskless nodes will be created from the repository definitions and placed in /depot/images – creating these images (each approx 4 GB in size) will take several minutes. – you can start up a separate session (Alt-F2) to monitor progress)

Install Complete:

Verifying your Installer Node

Verifying the Installer Node – OCS 5 & RH HPC

Check for hardware issues Check the kernel logs for any hardware driver errors:

dmesg

Check the system logs for any startup issues: less /var/log/messages

Log files for the entire Kusu installation process may be found under /root including “install.log” and “kusu.log”

kusu.log logs activity of the python based scripts that make up the installer

The kickstart file generated by the anaconda installer is also preserved in /root

Install node Test – OCS 5 & RH HPC

Network: Check that both eth0 and eth1 interfaces are up:

ifconfig

Verify the routing table is correct: (route or netstat –r) Traffic for the private network is routed over eth0 Traffic for the public network is routed over eth1 The default route will go through the gateway server pecified

during installation Multicast packets will be routed over eth0 (using 224.0.0.0

network) External hosts can be reached with the ping command

Installer node – OCS 5 & RH HPC

High performance interconnect driver for the interconnect hardware Use vendor diagnostic tools

RH HPC is not automatically configured to manage infiniband

Additional Kits for infiniband exist from vendors: Qlogic Cisco

Installer test – OCS 5 & RH HPC

Make sure all required services are running

Service How to check

Web Server service httpd status

DHCP service dhcpd status

DNS service named status

Xinetd service xinetd status

MySQL database service mysqld status

NFS service nfs status

AutoFS service autofs status

Plone/Zope service zinstance status

Check the Platform OCS 5 infrastructure repoman –l (ensure OCS commands can query the

database)

Check the kits: kitops –l | more

X Windows System Start: startx Start a browser to check the cluster homepage verify installed kits

http://localhost (admin/admin) http://localhost/nagios (admin/admin) http://localhost/cacti (admin/admin)

Installer test – OCS 5 & RH HPC

Installer test – OCS 5 & RH HPC

By default, http://localhost/Plone should show the start page for the Cluster – Plone is built on the zope application server. zope is in the zinstance directory under /opt/Plone-3.0.3

verify web-server - nagios

The nagios management component will be installed on the installer node by default – accessible at http://localhost/nagios - the default username password is admin / admin

Post-installation: verify web-server - Cacti

Similarly, cacti will be installed at http://master/cacti - the default username password is admin / admin

Platform OCS 5/ RH HPC Lab 5

Notes to Students and Instructor verify that RH HPC or OCS is working properly

check the web pages: Nagios, Cacti check the repositories repoman check the installed kits kitops run ngedit and examine the node groups run genconfig and generate some cluster configuration files.

Post installation

Congratulations!

You are ready to install the compute nodes !

Thank You!