8/8/2019 Compute Cluster Deployment Guide
1/35
Step-by-Step Guide to Installing,Configuring, and Tuning a High-Performance Compute ClusterWhite Paper
Published: June 2007
For the latest information, please see http://www.microsoft.com/windowsserver2003/ccs
http://www.microsoft.com/windowsserver2003/ccshttp://www.microsoft.com/windowsserver2003/ccs8/8/2019 Compute Cluster Deployment Guide
2/35
Contents
Introduction ...............................................................................................................................3
Before You Begin......................................................................................................................6Plan Your Cluster..................................................................................................................6
Install Your Cluster Hardware......................................................................................... ......7
Configure Your Cluster Hardware...................................................................................... ...8
Obtain Required Software.....................................................................................................8
Installation, Configuration, and Tuning Steps..........................................................................13
Step 1: Install and Configure the Service Node........................................................... .......13
Step 3: Install and Configure the Head Node......................................................................17
Step 4: Install the Compute Cluster Pack............................................................................19
Step 5: Define the Cluster Topology...................................................................................20
Step 6: Create the Compute Node Image...........................................................................20Step 7: Capture and Deploy Image to Compute Nodes......................................................23
Step 8: Configure and Manage the Cluster.........................................................................24
Step 9: Deploy the Client Utilities to Cluster Users.............................................................26
Appendix A: Tuning your Cluster.............................................................................................28
Appendix B: Troubleshooting Your Cluster......................................................................... ....31
Appendix C: Cluster Configuration and Deployment Scripts................................................. ..34
Related Links...........................................................................................................................35
8/8/2019 Compute Cluster Deployment Guide
3/35
Introduction
High-performance computing is now within reach for many businesses by clustering industry-standard servers. These clusters can range from a few nodes to hundreds of nodes. In thepast, wiring, provisioning, configuring, monitoring, and managing these nodes and providing
appropriate, secure user access was a complex undertaking, often requiring dedicatedsupport and administration resources. However, Microsoft Windows Compute ClusterServer 2003 simplifies installation, configuration, and management, reducing the cost ofcompute clusters and making them accessible to a broader audience.
Windows Compute Cluster Server 2003 is a high-performance computing solution that uses
clustered commodity x64 servers that are built with a combination of the Microsoft Windows
Server 2003 Compute Cluster Edition operating system and the Microsoft Compute Cluster
Pack. The base operating system incorporates traditional Windows system management
features for remote deployment and cluster management. The Compute Cluster Pack
contains the services, interfaces, and supporting software needed to create and configure the
cluster nodes, as well as the utilities and management infrastructure. Individuals tasked with
Windows Compute Cluster Server 2003 administration and management have the advantageof working within a familiar Windows environment, which helps enable users to quickly and
easily adapt to the management interface.
Windows Compute Cluster Server 2003 is a significant step forward in reducing the barriers todeployment for organizations and individuals who want to take advantage of the power of acompute clustering solution.
- Integrated software stack. Windows Compute Cluster Server 2003 provides an
integrated software stack that includes operating system, job scheduler, message passing
interface (MPI) layer, and the leading applications for each target vertical.
- Better integration with IT infrastructure. Windows Compute Cluster Server 2003
integrates seamlessly with your current network infrastructure (for example, Active
Directory), enabling you to leverage existing organizational skills and technology.
- Familiar development environment. Developers can leverage existing Windows-based
skills and experience to develop applications for Windows Compute Cluster Server 2003.
Microsoft Visual Studio is the most widely used integrated development environment
(IDE) in the industry, and Visual Studio 2005 includes support for developing HPC
applications, such as parallel compiling and debugging. Third-party hardware and
software vendors provide additional compiler and math library options for developers
seeking an optimized solution for existing hardware. Windows Compute Cluster Server
2003 supports the use of MPI with Microsofts MPI stack, or the use of stacks from other
vendors.
This step-by-step guide is based on the highly successful cluster deployment at NationalCenter for Supercomputing Applications (NCSA) at the University of Illinois at Champaign-Urbana. The cluster was built as a joint effort between NCSA and Microsoft, using commonlyavailable hardware and Microsoft software. The cluster was composed of 450 x64 servers,achieving 4.1 teraflops (TFLOPs) on 896 processors using the widely accepted LINPACKbenchmark. Figure 1 shows the cluster topology used for the NCSA deployment, including thepublic, private, and MPI networks.
8/8/2019 Compute Cluster Deployment Guide
4/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 4
Internet
Windows Server2003 R2
Standard x64 Edition Microsoft Compute Cluster
Pack SQL Server2005 Standard
Edition x64
Windows Server2003Compute Cluster Edition
Microsoft ComputeCluster Pack
All other required appsand drivers installed
Windows Server2003Enterprise Edition x86
Automated
Deployment Services
Compute node images
DNS & DHCP
Active D irectory
Windows Server2003Compute Cluster
Public Network
(GigE)
Private Network
(GigE)
Compute Node Image
Head Node
Service Node
(ADS Server)
Figure 1 Supported cluster topology similar to NCSA-deployment topology
Although every IT environment is different, this guide can serve as a basis for setting up yourlarge-scale compute cluster. If you need additional guidance, see the Related Links section atthe end of this guide for more resources.
NoteThe intended audience for this document is network administrators who have at least twoyears experience with network infrastructure, management, and configuration. The exampledeployment outlined in this document is targeted at clusters in excess of 100 nodes. Althoughthe steps discussed here will work for smaller clusters, they represent steps modeled on largedeployments for enterprise-scale and research-scale clusters.
8/8/2019 Compute Cluster Deployment Guide
5/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 5
Note
The skill level that is required to complete the steps in this document assumes knowledge ofhow to install, configure, and manage Microsoft Windows Server 2003 in an Active Directoryenvironment, and experience in adding and managing computers and users within a domain.
Note
This is Version 1 of this document. To download the latest updated version, visit the MicrosoftWeb site (http://www.microsoft.com/hpc/). The update may contain critical information thatwas not available when this document was published.
http://www.microsoft.com/hpc/http://www.microsoft.com/hpc/http://www.microsoft.com/hpc/http://www.microsoft.com/hpc/8/8/2019 Compute Cluster Deployment Guide
6/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 6
Before You Begin
Setting up a compute cluster with Windows Server 2003 Compute Cluster Edition begins withthe following tasks:
1. Plan your cluster.2. Install your cluster hardware.
3. Configure your cluster hardware.
4. Obtain required software.
When you have completed these tasks, use the steps in theInstallation, Configuration, andTuning Steps section to help you install, configure, and tune your cluster.
Plan Your Cluster
This step-by-step guide provides basic instructions on how to deploy a Windows computecluster. Your cluster planning should cover the types of nodes that are required for a cluster,and the networks that you will use to connect the nodes. Although the instructions in thisguide are based on one specific deployment, you should also consider your environment andthe number and types of hardware you have available.
Your cluster requires three types of nodes:
- Head node. A head node mediates all access to the cluster resources and acts as a
single point for cluster deployment, management, and job scheduling. There is only one
head node per cluster.
- Service node. A service node provides standard network services, such as directory and
DNS and DHCP services, and also maintains and deploys compute node images to new
hardware in the cluster. Only one service node is needed for the cluster, although you can
have more than one service node for different roles in the clusterfor example, moving
the image deployment service to a separate node.
- Compute node. A compute node provides computational resources for the cluster.
Compute nodes are provided jobs and are managed by the head node.
Additional node types that can be used but are not required are remote administration nodesand application development nodes. For an overview of device roles in the cluster, see theWindows Compute Cluster Server 2003 Reviewers Guide(http://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspx).
Your cluster also depends on the number and types of networks used to connect the nodes.The Reviewers Guide discusses the topologies that you can use to connect your nodes, byusing combinations of private and public adapters for message passing between the nodesand system traffic among all of the nodes. For the cluster detailed in this guide, the head nodeand service node have public and private adapters for system traffic, and the compute nodeshave private and message passing interface (MPI) adapters. (Note: This is not a supportedtopology but is very similar to one that is.) Consult the Reviewers Guide for the advantages ofeach network topology.
Lastly, you should consider the level of cluster expertise, networking knowledge, and amountof management time available on your staff to dedicate to your cluster. Although deploymentand management is simplified with Windows Compute Cluster Server 2003, keep in mind thatno matter what the circumstances, a large-scale compute cluster deployment should not betaken lightly. It is important to understand how management and deployment work when
http://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspxhttp://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspx8/8/2019 Compute Cluster Deployment Guide
7/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 7
planning for the appropriate resources. Compute Cluster Server uses robust, enterprise-gradetechnologies for all aspects of network and device management. Its management tools andprograms allow granular, role-based management of security for cluster administration andcluster users, and its network and system management tools can easily and quickly deployapplications and jobs using familiar, wizard-based interfaces. Additional compute nodes canbe added automatically to the compute cluster by simply plugging the nodes in andconnecting them to the cluster. Extensive (and expensive) daily hands-on tweaking,configuration, and management are not needed when using commodity hardware and astandards-based infrastructure.
Install Your Cluster Hardware
For ease of management and configuration, all nodes in the deployment in this guide will usethe same basic hardware platform. Hardware requirements for computers running WindowsCompute Cluster Server 2003 are similar to those for Windows Server 2003, Standard x64Edition. You can find the system requirements for your cluster athttp://www.microsoft.com/windowsserver2003/ccs/sysreqs.mspx. Table 1 shows a list ofhardware for all nodes. This list is based on the hardware used in the NCSA deployment.
Table 1: Hardware for All Nodes
Component Recommended Hardware
CPU Blade servers - Each blade has two single-core
3.2 GHz processors with 2 MB cache and an
800MHz front-side bus. Motherboard includes 4x
PCI Express slots.
RAM 4 x 1GB 400 MHz DIMMs. For compute nodes,
you should plan on having 2 GB RAM per core.
Storage SCSI adapter, 73GB 10K RPM Ultra320 SCSI
disk. RAID may be used on any node, but was
not used in this deployment. For the head node,
you should plan on having three disks: one for
the OS, one for the database, and one for the
transaction logs. This will provide improved
performance and throughput.
Network Interface Cards 1000 Mb Gigabit Ethernet adapter
1x InfiniBand 4x PCE Express adapter
Gigabit Network Hardware 48-port Gigabit switch per rack: 40 ports for
blades, 4 for uplink to ring
48-port Layer 2 Gigabit switches in ring
configuration
InfiniBand Network Hardware 5x 24-port InfiniBand switches per rack
2x 96-port InfiniBand switches for cross-rack
connectivity
Note
The head node and the network services node each use two Gigabit Ethernet networkadapters; both the compute nodes and the head nodes use the private MPI network, thoughthe head nodes MPI interface was disabled for this specific deployment. Also, the servicenode requires a 32-bit operating system, since ADS will only work with 32-bit, but you can runthe operating system on 32-bit or 64-bit hardware. (This is a custom configuration used on the
http://www.microsoft.com/windowsserver2003/ccs/sysreqs.mspxhttp://www.microsoft.com/windowsserver2003/ccs/sysreqs.mspxhttp://www.microsoft.com/windowsserver2003/ccs/sysreqs.mspx8/8/2019 Compute Cluster Deployment Guide
8/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 8
cluster deployment at NCSA and is not supported for general use. However, it is very similarto a supported cluster topology. For more information on supported cluster topologies, pleaserefer to the Windows Compute Cluster Server 2003 Reviewers Guide.)
Configure Your Cluster Hardware
When you have added your switches and blades to the rack, you must configure the networkconnections and network hardware prior to installing the network software. To configure yourhardware, follow the checklist in Table 2.
Table 2: Hardware Configuration Checklist
Check whencompleted
Configuration Item
Connect all high-speed interconnect connections from the pass-through
module on the chassis to the racks high-speed interconnect switches.
Connect all Gigabit Ethernet connections from the pass-through module
on the chassis to the racks 48-port Gigabit Ethernet switch.
Connect all Infiniband switches to the Layer 2 switches.
Connect all Gigabit Ethernet switches to the Gigabit Ethernet Layer 2
switches.
Disable the built-in subnet manager on all switches. The built-in subnet
manager doesnt support OpenIB clients, and conflicts with the subnet
manager that does support such clients.
Change the BIOS boot sequence on all nodes to Network Pre-boot
Execution Environment (PXE) first, CD ROM second, and Hard Drive
third. For platforms that dynamically remove missing devices at power-
up, an efficient way to set the hard drives last in the boot order is to pull
the hard drives, power up the devices once, power off the devices, put
the drives back in, and then power up again. The boot order will be set
correctly thereafter.
Disable hyperthreading on all nodes and set the nodes system clock to
the correct time zone, if required.
Obtain a list of all private Gigabit Ethernet adapter MAC addresses for
the compute nodes. These addresses are used as input with a
configuration script to identify your nodes and configure them with the
proper image. In some cases you can use the blade chassis telnet
interface to collect the MAC addresses. See Appendix C for a
description of the input file and the file format.
Obtain Required Software
In addition to Windows Compute Cluster Server 2003, you will need to obtain operatingsystems, administration utilities, drivers, and Quick Fix files to bring your systems up-to-date.Table 3 lists the software required for each node type, and the notes following the chart showyou where to obtain the necessary software. The following list is based on the software usedin the NCSA deployment.
http://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspxhttp://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspxhttp://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspx8/8/2019 Compute Cluster Deployment Guide
9/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 9
Table 3: Software Required by Node Type
Software Required by Node Type HeadNode
ServiceNode
ComputeNode
Windows Server 2003 R2 Standard Edition x64
Windows Server 2003 R2 Enterprise Edition x86
Windows Server 2003 Compute Cluster Edition x64
Microsoft Compute Cluster Pack
SQL Server 2005 Standard Edition x64
Automated Deployment Services (ADS) version 1.1
Microsoft Management Console (MMC) 3.0
.NET Framework 2.0
Windows Preinstall Environment (WinPE)
QFE KB910481
QFE KB914784
Microsoft System Preparation tool (sysprep.exe)
Cluster configuration and deployment scripts
Latest network adapter drivers
Notes on the software required the deployment described in this paper:
Microsoft SQL Server 2005 Standard Edition x64: By default, the Compute Cluster Packwill install MSDE on the head node for data and node tracking purposes. Because MSDE islimited to eight concurrent connections, SQL Server Standard Edition 2005 is recommended
for clusters with more than 64 compute nodes.
ADS version 1.1: ADS requires 32-bit versions of Windows Server 2003 Enterprise Editionfor image management and deployment. Future Microsoft imaging technology (WindowsDeployment Services, available in the next release of Windows Server, code nameLonghorn) will support 64-bit software. You can download the latest version of ADS from theMicrosoft Web site(http://www.microsoft.com/windowsserver2003/technologies/management/ads/default.mspx).
Because this paper is based on a previous large-scale compute cluster deployment at NCSA,it details using ADS to deploy compute node images as opposed to using Microsoft WindowsDeployment Services (WDS). However, future updates to this paper will explain how to useWDS to deploy compute node images to your cluster.
MMC 3.0: MMC 3.0 is required for the administration node, which may or may not be thehead node. It is automatically installed by the Compute Cluster Pack on the computer that isused to administer the cluster. You can also download and install the latest versions forWindows Server 2003 and Windows XP x86 and x64 versions at theMicrosoft Web site(http://support.microsoft.com/?kbid=907265).
.NET Framework 2.0: The .NET Framework is automatically installed by the Compute ClusterPack. You can also download the latest version at the Microsoft Web site(http://msdn2.microsoft.com/en-us/netframework/aa731542.aspx).
http://www.microsoft.com/windowsserver2003/technologies/management/ads/default.mspxhttp://support.microsoft.com/?kbid=907265http://support.microsoft.com/?kbid=907265http://msdn2.microsoft.com/en-us/netframework/aa731542.aspxhttp://www.microsoft.com/windowsserver2003/technologies/management/ads/default.mspxhttp://support.microsoft.com/?kbid=907265http://msdn2.microsoft.com/en-us/netframework/aa731542.aspx8/8/2019 Compute Cluster Deployment Guide
10/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 10
WinPE: You will need a copy of Windows Preinstallation Environment for Windows Server2003 SP1. If you need to add your Gigabit Ethernet drivers to the WinPE image, you will needto obtain a copy of the Windows Server 2003 SP1 OEM Preinstallation Kit (OPK), whichcontains the programs needed to update the WinPE image for your hardware. WinPE and theOPK are available only to customers with enterprise or volume license agreements; contactyour Microsoft representative for more information.
QFE KB910481: This Quick Fix is for potential problems when deploying Winsock Direct in afast Storage Area Network (SAN) environment. You can download the quick fix at theMicrosoft Web site (http://support.microsoft.com/?kbid=910481).
QFE KB914784: This Quick Fix is in response to a Security Advisory and provides additionalkernel protection in some environments. You can download the quick fix at the Microsoft Website(http://support.microsoft.com/?kbid=914784).
Sysprep.exe: Sysprep.exe is used to help prepare the compute node image prior todeployment. Sysprep is included as part of Windows Server 2003 Compute Cluster Edition.Note: You must use the x64 bit version of Sysprep in order to capture and deploy yourimages.
Cluster configuration and deployment scripts: These scripts are available to download athttp://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/default.mspx. They include
hard-coded paths and require you to follow the installation and usage instructions exactly asdescribed in this guide. If you must modify the scripts for your deployment, make sure thatyou verify that the scripts work in your environment before using them to deploy your cluster.
For the scripts to run properly, you will also need specific information about your cluster andits hardware. Appendix C contains a sample input file (AddComputeNodes.csv) that is used toautomatically configure the compute cluster nodes and populate Active Directory with nodeinformation. Table 4 lists the specific items needed, with room for you to write down thevalues for your deployment. You can then use this information when building your cluster andwhen creating your compute node images. Follow the instructions in Appendix C for creatingyour own sample input file.
Note
Every item in Table 4 must have an entry or the input file will not work properly. If you do nothave a value for a field, use a hyphen - for the field instead.
Latest network adapter drivers: Contact the manufacturer of your network adapters for themost recent drivers. You will need to install these drivers on your cluster nodes.
Table 4: Cluster Information Needed for Script Input File
Input Value Your Value Description
FullName Populates the cluster node registry with
the Registered Owner name.
Organisation name Populates the cluster node registry with
the Registered Organization name.
ProductKey 25-digit alphanumeric product key usedfor all compute cluster nodes. Contact
your Microsoft representative for your
volume license key.
Server Name Populates Active Directory with a
Compute Cluster node name.
http://support.microsoft.com/?kbid=910481http://support.microsoft.com/?kbid=914784http://support.microsoft.com/?kbid=914784http://support.microsoft.com/?kbid=914784http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/default.mspxhttp://support.microsoft.com/?kbid=910481http://support.microsoft.com/?kbid=914784http://support.microsoft.com/?kbid=914784http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/default.mspx8/8/2019 Compute Cluster Deployment Guide
11/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 11
Input Value Your Value Description
Srv Description Populates the ADS Management console
with a text description of the node. Can
be used to list rack placement or other
helpful information.
Server MAC Gigabit Ethernet MAC address for eachcompute cluster node.
Machine Name Used to configure the cluster node with a
machine name. Must match the value in
the Server Name field.
Admin Password Local administrator password.
Domain The cluster domain name (for example,
HPCCluster.local).
Domain Username Account name with permission to add
computers to a domain.
Domain Password Password for the account with
permission to add computers to adomain.
ImageName The image name to be installed on the
cluster node (for example, CCSImage).
HPC Cluster Name The head node name must be used for
the cluster name.
NetworkTopology Must be Single.
PartitionSize Not used.
PublicIP Not used.
PublicSubnet Not used.
PublicGateway Not used.
PublicDNS Not used.
PublicNICName Not used.
PublicMAC Not used.
PrivateIP Not used.
PrivateSubnet Not used.
PrivateGateway Not used.
PrivateDNS Not used.
PrivateNICName Not used.
PrivateMAC Not used.
MPIIP Assigns a static address to the MPI
adapter (for example, 11.0.0.1).
MPISubnet Assigns a subnet mask to the MPI
adapter (for example, 255.255.0.0).
MPIGateway Not used.
MPIDNS Not used.
8/8/2019 Compute Cluster Deployment Guide
12/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 12
Input Value Your Value Description
MPINICName Not used.
MPIMAC Not used.
MachineOU Populates Active Directory with Machine
OU information (for example,
OU=Cluster
Servers,DC=HPCCluster,DC=local).
8/8/2019 Compute Cluster Deployment Guide
13/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 13
Installation, Configuration, and Tuning Steps
To install, configure, and tune a high-performance compute cluster, complete the followingsteps:
1. Install and configure the service node.2. Install and configure ADS on the service node.
3. Install and configure the head node.
4. Install the Compute Cluster Pack.
5. Define the cluster topology.
6. Create the compute node image.
7. Capture and deploy image to compute nodes.
8. Configure and manage the cluster.
9. Deploy the client utilities to cluster users.
Step 1: Install and Configure the Service Node
The service node provides all the back-end network services for the cluster, including
authentication, name services, and image deployment. It uses standard Windows technology
and services to manage your network infrastructure. The service node has two Gigabit
Ethernet network adapters and no MPI adapters. One adapter connects to the public network;
the other connects to the private network dedicated to the cluster.
There are five tasks that are required for installation and configuration:
1. Install and configure the base operating system.
2. Install Active Directory, Domain Name Services (DNS), and DHCP.3. Configure DNS.
4. Configure DHCP.
5. Enable Remote Desktop for the cluster.
Install and configure the base operating system.Follow the normal setup procedure forWindows Server 2003 R2 Enterprise Edition, with the exceptions as noted in the followingprocedure.
To install and configure the base operating system
Boot the computer to the Windows Server 2003 R2 Enterprise Edition CD.
1. Accept the license agreement.
2. On the Partition List screen, create two partitions: one partition of 30 GB, and a
second using the remainder of the space on the hard drive. Select the 30 GB partition
as the install partition, and then press ENTER.
3. On the Format Partition screen, accept the default of NTFS, and then press ENTER.
Proceed with the remainder of the text-mode setup. The computer then reboots into
graphical setup mode.
8/8/2019 Compute Cluster Deployment Guide
14/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 14
4. On the Licensing Modes page, select the option for which you are licensed, and then
configure the number of concurrent connections if needed. Click Next.
5. On the Computer Name and Administrator Password page, type a name for the
service node (for example, SERVICENODE). Type your local administrator password
twice, and then press ENTER.
6. On the Networking Settings page, select Custom settings, and then click Next.
7. On the Networking Components page for your private adapter, select Internet
Protocol (TCP/IP), and then click Properties. On the Internet Protocol (TCP/IP)
Properties page, select Use the following IP address. Configure the adapter with a
static nonroutable address, such as 10.0.0.1, and a 24-bit subnet mask (255.0.0.0).
Select Use the following DNS server addresses, and then configure the adapter to
use 127.0.0.1. Click OK, and then click Next.
Note: If this computer has a 1394 Net Adapter, it will ask you to set the IP for that
adapter first (before setting setting TCP/IP properties). Click Next to skip this page
(unnecessary to the cluster deployment) and move on to setting the TCP/IP
properties.
8. Repeat the previous step for the public adapter. Configure the adapter to acquire its
address by using DHCP from the public network. If you prefer, you can assign it a
static address if you have one already reserved. Configure the public adapter to use
127.0.0.1 for DNS queries. Click OK, and then click Next.
9. On the Workgroup or Computer Domain page, accept the default ofNo and the
default ofWORKGROUP, and then click Next. The computer will copy files, and then
reboot.
10. Log in to the server as administrator. Click Start, click Run, type diskmgmt.msc, and
then click OK. The Disk Management console starts.
11. Right-click the second partition on your drive, and then click Format. In the Format
dialog box, select Quick Format, and then click OK. When the format process is
finished, close the Disk Management console.
Install Active Directory, DNS, and DHCP. Windows Server 2003 provides a wizard to
configure your server as a typical first server in a domain. The wizard configures your
server as a root domain controller, installs and configures DNS, and then installs and
configures DHCP.
To install Active Directory, DNS, and DHCP
1. Log in to your service node as Administrator. If the Manage Your Serverpage is not
visible, click Start, and then click Manage Your Server.
2. Click Add or remove a role. The Configure Your Server Wizard starts. Click Next.
3. On the Configuration Options page, select Typical configuration for a first server,
and then click Next.
4. On the Active Directory Domain Name page, type the domain name that will be
used for your cluster and append the .local suffix (for example, HPCCluster.local).
Click Next.
8/8/2019 Compute Cluster Deployment Guide
15/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 15
5. On the NetBIOS Domain Name page, accept the default NetBIOS name (for
example, HPCCLUSTER) and click Next. At the Summary of Selections page, click
Next. If the Configure Your Server Wizard prompts you to close any open programs,
click OK.
6. On the NAT Internet Connection page, make sure the public adapter is selected.
Deselect Enable security on the selected interface, and then click Next. If youhave more than two network adapters in your computer, the Network Selection page
appears. Select the private LAN adapter and then click Next. Click Finish. After the
files are copied, the server reboots.
7. After the server reboots, log on as Administrator. Review the actions listed in the
Configure Your Server Wizard, and then click Next. Click Finish.
6. On the Network Boot Service Settings page, make sure that Use this path is selected.
Insert the Windows Server 2003 R2 Enterprise Edition x86 CD into the drive. Browse to
the CD drive, or type the drive containing the CD, and then click Next.
7. On the Windows PE Repository page, select Location of Windows PE.Browse to thefolder containing the WinPE binaries (for example, C:\WinPE). In the Repository name
text box, type a name for your repository (for example, NodeImages). Click Next.
8. On the Image Location page, type the path to the folder where the images will be stored.
These must be on the second partition that you created on your server (for example,
E:\Images). The folder will be created and shared automatically. Click Next.
9. If ADS Setup Wizard detects more than one network adapter in your computer, the
Network Settings for ADS Services page is displayed. In the Bind to this IP address
drop-down list, select the IP address that the ADS services will use to distribute images
on the private network, and then click Next.
10. On the Installation Confirmation page, click Install.11. On the Completing the Automated Deployment Services Setup Wizard page, click
Finish.Close the Automated Deployment Services Welcome dialog box.
12. To open the ADS Management console, click Start,click All Programs, click Microsoft
ADS,and then click ADS Management.
13. Expand the Automated Deployment Services node,andthen select Services. In the
center pane, right-click Controller Services,and then click Properties. On the Controller
Service Properties page, select the Service tab, and then change Global job template to
boot-to-winpe. For the Device Identifier, select MAC Address. For the WinPE Repository
Name, type NodeImages or the repository name that you created earlier. Click Apply,
and then click OK.
14. In the ADS Management console, right-click Image Distribution Service, and then click
Properties.Select the Service tab, and ensure that Multicast image deployment is
selected.Click OK.
Share the ADS certificate. ADS creates a computer certificate when it is installed. Thiscertificate is used to identify all computers in the cluster. The certificate must be shared sothat the compute node image can import the certificate and then use it during theconfiguration process.
8/8/2019 Compute Cluster Deployment Guide
16/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 16
To share the ADS certificate
1. Click Start, click Administrative Tools, and then click Server Management. The Server
Management console opens.
2. Click Shared Folders, and then click New File Share. The Share a Folder Wizard starts.
Click Next.
3. On the Folder Path page, click Browse, and then browse to C:\ Program Files\ Microsoft
ADS\ Certificate. Click Next.
4. On the Name, Description, and Settings page, accept the defaults, and then click Next.
5. On the Permissions page, accept the defaults, and then click Finish. Click Close, and
then close the Server Management console. The ADS certificate is shared on your
network.
Import ADS templates. ADS includes several templates that are useful when managing yournodes, including reboot-to-winpe and reboot-to-hd. The templates are not installed by default;you must add them to ADS using a batch file. You also need to add the compute clustertemplates to ADS so that you can capture and deploy the compute node image on your
network.
To import ADS templates
1. Open Windows Explorer and browse to C:\ Program Files\ Microsoft ADS\ Samples\
Sequences.
2. Double-click create-templates.bat. The script file automatically installs the templates in
ADS. Close Windows Explorer.
3. Click Start, click All Programs, click Microsoft ADS, and then click ADS Management.
The ADS Management console opens.
4. Browse to Job Templates. Right-click Job Templates, and then click New Job
Template. The New Job Template Wizard starts. Click Next.
5. On the Template Type page, select An entirely new template, and then click Next.
6. On the Name and Description page, type a name for the compute node capture
template (for example, Capture Compute Node). Type a description (for example, Run
within Windows Server CCE), and then click Next.
7. On the Command Type page, select Task sequence, and then click Next.
8. On the Script or Executable Program page, browse to C:\hpc-ccs\sequences. Select All
files from the Files of type drop-down list. Select Capture-CCS-image-with-winpe.xml,
and then click Open. Click Next.
9. On the Device Destination page, select None, and then click Next. Click Finish. Your
capture template is added to ADS.
10. Repeat steps 4 through 9. In step 6, use Deploy Compute Node and Run from WinPE as
the name and description. In step 8, select the file Deploy-CCS-image-with-winpe.xml.
When finished, you have added the deployment template to ADS.
Add devices to ADS. Follow the normal setup procedure for Windows Server 2003 R2Enterprise Edition, with the exceptions noted later.
8/8/2019 Compute Cluster Deployment Guide
17/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 17
To add devices to ADS
1. Populate the ADS server with ADS devices. Click Start, click Run, type cmd.exe, and
then click OK. Change the directory to C:\HPC-CCS\Scripts.
2. Type AddADSDevices.vbs AddComputeNodes-Sample.csv (use the name of your input
file instead of the sample file name). The script will echo the nodes as they are added to
the ADS server. When the script is finished, close the command window.
If your company uses a proxy server to connect to the Internet, you should configure yourserver so that it can receive system and application updates from Microsoft.
1. To configure your proxy server settings, open Internet Explorer. Click Tools, andthen click Internet Options.
2. Click the Connections tab, and then click LAN Settings.
3. On the Local Area Network (LAN) Settings page, select Use a proxy server foryour LAN. Enter the URL or IP address for your proxy server.
4. If you need to configure secure HTTP settings, click Advanced, and then enter theURL and port information as needed.
5. Click OK three times, and then close Internet Explorer.
When you have finished configuring your server, click Start, click All Programs, and thenclick Windows Update. This will ensure that your server is up-to-date with service packs andsoftware updates that may be needed to improve performance and security.
Step 3: Install and Configure the Head Node
The head node is responsible for managing the compute cluster nodes, performing jobcontrol, and acting as the gateway for submitted and completed jobs. It requires SQL Server2005 Standard Edition as part of the underlying service and support structure. You shouldconsider using three hard drives for your head node: one for the operating system, one for theSQL Server database, and one for the SQL Server transaction logs. This will provide reduceddrive contention, better overall throughput, and some transactional redundancy should thedatabase drive fail.
In some cases, enabling hyperthreading on the head node will also result in improvedperformance for heavily-loaded SQL Server applications.
There are two tasks that are required for installing and configuring your head node:
1. Install and configure the base operating system.
2. Install and configure SQL Server 2005 Standard Edition.
To install and configure the base operating system
1. On the head node computer, boot to the Windows Server 2003 R2 Standard Edition
x64 CD.
2. Accept the license agreement.
3. On the Partition List screen, create two partitions: one partition of 30 GB, and a
second that uses the remainder of the space on the hard drive. Select the 30 GB
partition as the install partition, and then press ENTER.
8/8/2019 Compute Cluster Deployment Guide
18/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 18
4. On the Format Partition screen, accept the default of NTFS, and then press ENTER.
Proceed with the remainder of the text-mode setup. The computer then reboots into
graphical setup mode.
5. On the Licensing Modes page, select the option for which you are licensed, and then
configure the number of concurrent connections, if needed. Click Next.
6. On the Computer Name and Administrator Password page, type a name for the
head node (for example, HEADNODE). Type the account with permission to join a
computer to the domain (for example, hpccluster\administrator), type the password
twice, and then press ENTER.
7. On the Networking Settings page, select Typical settings, and then click Next. This
will automatically assign addresses to your public and private adapters. If you want to
use static IP addresses for either interface, select Custom Settings, and then click
Next. Follow the steps that you used to configure your service node adapter settings.
8. On the Workgroup or Computer Domain page, selectYes, make this computer a
member of a domain. Type the name of your cluster domain (for example,
HPCCluster.local), and then click Next. When prompted, type the name and thepassword for an account that has permission to add computers to the domain
(typically, the Administrator account), and then click OK. Note: If your network adapter
drivers are not included on the Windows Server 2003 CD, then you will not be able to
join a domain at this time. Instead, make the computer a member of a workgroup,
complete the rest of setup, install your network adapters, and then join your head
node to the domain.
When you have configured the base operating system, you can install SQL Server 2005Standard Edition on your head node.
To install and configure SQL Server 2005 Standard Edition
1. Log on to your server as Administrator. Insert the SQL Server 2005 Standard Edition x64
CD into the head node. If setup does not start automatically, browse to the CD drive and
then run setup.exe.
2. On the End User License Agreement page, select I accept the licensing terms and
conditions, and then click Next.
3. On the Installing Prerequisites page, click Install.When the installations are complete,
click Next.The Welcome to the Microsoft SQL Server Installation Wizard starts.Click
Next.
4. On the System Configuration Check page, the installation program displays a report
with potential installation problems. You do not need to install IIS or address any IIS-
related warnings because IIS is not used in this deployment. Click Next.
5. On the Registration Information page, complete the Name and Company fields with the
appropriate information, and then click Next.
6. On the Components to Install page, select all check boxes, and then click Next.
7. On the Instance Name page, select Named instance, and then type
COMPUTECLUSTER in the text box. Your cluster must have this name, or Windows
Compute Cluster will not work. Click Next.
8/8/2019 Compute Cluster Deployment Guide
19/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 19
8. On the Service Account page, select Use the built-in System account, and then select
Local system in the drop-down list. In the Start services at the end of setup section,
select all options except SQL Server Agent, and then click Next.
9. On the Authentication Mode page, select Windows Authentication Mode. Click Next.
10. On the Collation Settings page, select SQL collations, and then select Dictionary
order case-insensitive for use with 1252 Character Set from the drop-down list. Click
Next.
11. On the Error and Usage Report Settings page, click Next.
12. On the Ready to Install page, click Install.When the Setup Progress page appears,
click Next.
13. On the Completing Microsoft SQL Server 2005 Setup page, click Finish.
14. Open the Disk Management console. Click Start, click Run, type diskmgmt.msc, and then
click OK.
15. Right-click the second partition on your drive, and then click Format. In the Format dialog
box, select Quick Format, and then click OK. When the format process finishes, close
the Disk Management console.
If your company uses a proxy server to connect to the Internet, you should configure yourhead node so that it can receive system and application updates from Microsoft.
1. To configure your proxy server settings, open Internet Explorer. Click Tools, and thenclick Internet Options.
2. Click the Connections tab, and then click LAN Settings.
3. On the Local Area Network (LAN) Settings page, select Use a proxy server foryour LAN. Enter the URL or IP address for your proxy server.
4. If you need to configure secure HTTP settings, click Advanced, and then enter the
URL and port information as needed.5. Click OK three times, and then close Internet Explorer.
When you have finished configuring your server, click Start, click All Programs, and thenclick Windows Update. This will ensure that your server is up-to-date with service packs andsoftware updates that may be needed to improve performance and security. You should electto install Microsoft Update from the Windows Update page. This service provides servicepacks and updates for all Microsoft applications, including SQL Server. Follow the instructionson the Windows Update page to install the Microsoft Update service.
Step 4: Install the Compute Cluster Pack
When the head node has been configured, you can install the Compute Cluster Pack thatcontains services, interfaces, and supporting software that is needed to create and configurecluster nodes. It also includes utilities and management infrastructure for your cluster.
To install the Compute Cluster Pack
1. Insert the Compute Cluster Pack CD into the head node. The Microsoft Compute
Cluster Pack InstallationWizard appears. Click Next.
2. On the Microsoft Software License Terms page, select I accept the terms in the
license agreement, and then click Next.
8/8/2019 Compute Cluster Deployment Guide
20/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 20
3. On the Select Installation Type page, select Create a new compute cluster with this
server as the head node. Do not use the head node as a compute node. Click Next.
4. On the Select Installation Location page, accept the default. Click Next.
5. On the Install Required Components page, a list of required components for the
installation appears. Each component that has been installed will appear with a check
next to it. Select a component without a check, and then click Install.
6. Repeat the previous step for all uninstalled components. When all of the required
components have been installed, click Next. The Microsoft Compute Cluster Pack
Installation Wizard completes. Click Finish.
Step 5: Define the Cluster Topology
After the Compute Cluster Pack installation for the head node is complete, a ClusterDeployment Tasks window appears with a To Do List. In this procedure, you will configure thecluster to use a network topology that consists of a single private network for the computenodes and a public interface from the head node to the rest of the network.
To define the cluster topology
1. On the To Do List page, in the Networking section, click Configure Cluster Network
Topology. The Configure Cluster Network Topology Wizard starts. Click Next.
2. On the Select Setup Type page, select Compute nodes isolated on private network
from the drop-down list. A graphic appears that shows you a representation of your
network. You can learn more about the different network topologies by clicking the Learn
more about this setup link. When you have reviewed the information, click Next.
3. On the Configure Public Network page, select the correct public (external) network
adaptor from the drop-down list. This network will be used for communicating between the
cluster and the rest of your network. Click Next.
4. On the Configure Private Network page, select the correct private (internal) adaptor
from the drop-down list. This network will be used for cluster management and node
deployment. Click Next.
5. On the Enable NAT Using ICS page, select Disable Internet Connection Sharing for
this cluster. Click Next.
6. Review the summary page to ensure that you have chosen an appropriate network
configuration, and then click Finish. Click Close.
Step 6: Create the Compute Node Image
You can now create a compute node image. This is the compute node image that will be
captured and deployed to each of the compute nodes. There are three tasks that are requiredto create the compute node image:
1. Install and configure the base operating system.
2. Install and configure the ADS agent and Compute Cluster Pack.
3. Update the image and prepare it for deployment.
8/8/2019 Compute Cluster Deployment Guide
21/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 21
To install and configure the base operating system
4. Start the node that you want to use to create your compute node image. Insert the
Microsoft Windows Server 2003 Compute Cluster Edition CD into the CD drive. Text-
mode setup launches automatically.
5. Accept the license agreement.
6. On the Partition List screen, create one partition of 16 GB. Select the 16 GB partition as
the install partition, and then press ENTER.
7. On the Format Partition screen, accept the default of NTFS, and then press ENTER.
Proceed with the remainder of the text-mode setup. The computer then reboots into
graphical setup mode.
8. On the Licensing Modes page, select the option for which you are licensed, and then
configure the number of concurrent connections, if needed. Click Next.
9. On the Computer Name and Administrator Password page, type a name for the
compute node that has not been added to ADS (for example, NODE000). Type your local
administrator password twice, and then press ENTER.
10. On the Networking Settings page, select Typical settings, and then click Next. This will
automatically assign addresses to your public and private adapters. The adapter
information for the deployed nodes will be automatically created when the image is
deployed to a node.
11. On the Workgroup or Computer Domain page, selectYes, make this computer a
member of a domain. Type the name of your cluster domain (for example, HPCCluster),
and then click Next. When prompted, type the name and the password for an account
that has permission to add computers to the domain (for example,
hpccluster\administrator), and then click OK. The computer will copy files, and then
reboot. Note: If your network adapter drivers are not included on the Windows Server
2003 Compute Cluster Edition CD, then you will not be able to join a domain at this time.Instead, make the computer a member of a workgroup, complete the rest of setup, install
your network adapters, and then join your compute node to the domain.
12. Log on to the node as administrator.
13. Copy the QFE files to your compute node. Run each executable and follow the
instructions for installing the quick fix files on your server.
14. Open Regedit. Click Start, click Run, type regedit, and then click OK.
15. Browse to HKEY_LOCAL_MACHINE\ SYSTEM\ CurrentControlSet\ Services\ Tcpip\
Parameters. Right-click in the right pane. Click New, and then click DWORD value. Type
SynAttackProtect (case sensitive), and then press ENTER.
16. Double-click the new key that you just created. Confirm that the value data is zero, andthen click OK.
17. Right-click in the right pane. Click New, and then click DWORD value. Type
TcpMaxDataRetransmissions (case sensitive), and then press ENTER.
18. Double-click the new key that you just created. In the Value data text box, type 20.
Ensure that Base is set to Hexadecimal, and then click OK.
19. Close Regedit.
8/8/2019 Compute Cluster Deployment Guide
22/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 22
20. Disable any network interfaces that will not be used by the cluster, or that do not have
physical network connectivity.
When you have configured the base operating system, you can then install and configure theADS Agent and the Compute Cluster Pack on your image.
To install and configure the ADS Agent and Compute Cluster Pack
1. Copy the ADS binaries to a folder on the compute node. Browse to the folder, and then
run ADSSetup.exe.
2. A Welcome page appears. Click Install ADS Administration Agent. The Administration
Agent Setup Wizard starts. Click Next.
3. On the License Agreement page, select I accept the terms of the license agreement,
and then click Next.
4. On the Configure Certificates page, select Now. Type the fully-qualified path to the
certificate share on the service node (for example, \\servicenode \Certificate\ adsroot.cer).
Click Next.
5. On the Configure the Agent Logon Settings page, select None, and then click Next.
6. On the Installation Confirmation page, click Install.
7. On the Completing the Administration Agent Setup Wizard page, click Finish. Close
the Automated Deployment Services Welcome page.
8. Insert the Compute Cluster Pack CD into the head node. The Microsoft Compute
Cluster Pack InstallationWizard appears. Click Next.
9. On the Microsoft Software License Terms page, select I accept the terms in the
license agreement, and then click Next.
10. On the Select Installation Type page, select Join this server to an existing compute
cluster as a compute node. Type the name of the head node in the text box (forexample, HEADNODE). Click Next.
11. On the Select Installation Location page, accept the default. Click Next.
12. On the Install Required Components page, a list of required components for the
installation appears. Each component that has been installed will appear with a check
next to it. Select a component without a check, and then click Install.
13. Repeat the previous step for all uninstalled components. When all of the required
components have been installed, click Next. When the Microsoft Compute Cluster Pack
completes, click Finish.
When you have installed and configured the ADS Agent and Compute Cluster pack, you canupdate your image with the latest service packs, and then prepare your image for deployment.
To update the image and prepare it for deployment
1. Run the Windows Update service on your compute node. If your cluster lies behind a
proxy server, configure Internet Explorer with your proxy server settings. For information
on how to do this, see Step 1: Install and Configure the Service Node, earlier in this
guide.
8/8/2019 Compute Cluster Deployment Guide
23/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 23
2. Run the Disk Cleanup utility. Click Start, click All Programs, click Accessories, click
System Tools, and then click Disk Cleanup. Select the C: drive, and then click OK.
Select all of the check boxes, and then click OK. When the cleanup utility is finished,
close the utility.
3. Run the Disk Defragmenter utility. Click Start, click All Programs, click Accessories,
click System Tools, and then click Disk Defragmenter. Select the C: drive, and thenclick Defragment. When the defragmentation utility is finished, close the utility.
Step 7: Capture and Deploy Image to Compute Nodes
You can now capture the compute node image that you just created. You can then deploy theimage to compute nodes on your cluster.
To capture the compute node image
1. If the compute node is not running, turn on the computer and wait for the node to boot into
Windows Server 2003 Compute Cluster Edition.
2. Log on to the service node as administrator. Click Start, and then click ADSManagement. Right-click Devices, and then click Add Device.
3. In the Add Device dialog box, type a name in the Name text box (for example, Node000),
a description for your node (for example, Compute Node Image), and then type the MAC
address for the node that is running the compute node image. Click OK. The status pane
will indicate that the node was created successfully. Click Cancel to close the dialog box.
4. Right-click your compute node name. Click Properties, and then click the User Variables
tab.
5. Click Add. In the Variables dialog box, in the Name text box, type Imagename. In the
Value text box, type a name for your image (for example, CCSImage). Click OK twice.
6. Right-click the compute node device again, and then click Properties. In the WinPE
repository name text box, type the name for your repository that you defined when you
installed ADS (for example, NodeImages). Click Apply, and then click OK.
7. Right-click the compute node that you just added, and then click Take Control.
8. Right-click the compute node device again, and then click Run job. The Run Job Wizard
starts. Click Next.
9. On the Job Type page, select Use an existing job template, and then click Next.
10. On the Template Selection page, select Capture Compute Node. Click Next.
11. On the Completing the Run Job Wizard page, click Finish. ACreated Jobs dialog box
appears. Click OK. The ADS Agent on your compute node runs the job, using Sysprep to
prepare and configure the node image, and then using the ADS image capture functionsto create and copy the image to ADS. When the image capture is complete, the node
boots into WinPE.
Deploy the image to nodes on the cluster. When you have captured the compute nodeimage to the service node, you can deploy the image to compute nodes on the cluster.
8/8/2019 Compute Cluster Deployment Guide
24/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 24
To deploy the image to nodes on the cluster
1. Log on to the service node as administrator. Click Start, click All Programs,click
Microsoft ADS,and then click ADS Management.
2. Expand the Automated Deployment Services node,andthen select Devices.
3. Select all devices that appear in the right pane, right-click on the selected devices,
and then select Take Control. The Control Status changes toYes.
4. Right-click on the devices, and then click Run job.
5. The Run Job Wizard appears. Click Next.
6. On the Job Type page, select Use an existing job template.Click Next.
7. On the Template Selection page, select boot-to-winpe. Click Next.
8. On the Completing the Run Job Wizard page, click Finish.
9. Boot the computer nodes. The network adapters should already be configured to use
PXE and obtain the WinPE image from the service node. To avoid overwhelming the
ADS server during unicast deployment of WinPE image, it is recommended that you
boot only four nodes at a time. Subsequent sets of four nodes should be booted uponly after all of the previous sets of four nodes are showing Connected to WinPE
status in the ADS Management window on the head node.
10. After all the nodes are connected to WinPE, you can deploy the compute node image
to those nodes. Right-click the devices, and then click Run job.
11. The Run Job Wizard appears. Click Next.
12. On the Job Type page, select Use an existing job template.Click Next.
13. On the Template Selection page, select Deploy CCS Image. Click Next.
14. On the Completing the Run Job Wizard page, click Finish.The nodes automatically
download and run the image. This task will take a significant amount of time,
especially when you are installing hundreds of nodes. Depending on your availablestaff, you may want to run this as an overnight task. When finished, your nodes are
joined to the domain and ready to be managed by the head node.
Step 8: Configure and Manage the Cluster
The head node is used to manage and maintain your cluster once the node images havebeen deployed. The Compute Cluster Pack includes a Compute Cluster Administrator consolethat simplifies management tasks, including approving nodes on the cluster and adding usersand administrators to the cluster. The console includes a To Do List that shows you whichtasks have been completed. Follow these steps to configure and manage your cluster:
1. Disable Windows Firewall on all nodes in the cluster.
2. Approve nodes that have joined the cluster.
3. Add users and administrators to the cluster.
Disable Windows Firewall on all nodes on the cluster. The Compute Cluster Administratorconsole enables you to define how the firewall is configured on all cluster node networkadapters. For best performance on large-scale deployments, it is recommended that youdisable Windows Firewall on all interfaces.
8/8/2019 Compute Cluster Deployment Guide
25/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 25
To disable Windows Firewall on all nodes on the cluster
1. Click Start, click AllPrograms, click Microsoft Compute Cluster Pack, and then click
Compute Cluster Administrator.
2. Click the To Do List. In the Networking section in the results pane, click Manage
Windows Firewall Settings. The Manage Windows Firewall Wizard starts. Click Next.
3. On the Configure Firewall page, select Disable Windows Firewall, and then click Next.
4. On the View Summary page, click Finish. On the Result page, click Close. When
compute nodes are approved to join the cluster, the firewall will be disabled.
Approve nodes that have joined the cluster. When you deploy Compute Cluster Editionnodes, they have joined the cluster but have not been approved to participate or process any
jobs. You must approve them before they can receive and process jobs from your users.
To approve nodes that have joined the cluster
1. Open the Compute Cluster Administrator console. Click Node Management.
2. In the results pane, select one or more nodes that display a status of Pending Approval.3. In the task pane, click Approve. You can also right-click the selected nodes and then click
Approve.
4. When the nodes are approved, the status changes to Paused. You can leave the nodes in
Paused status, or in the task pane you can click Resume to enable the node to receive
jobs from your users.
Add users and administrators to your cluster. In order to use and maintain the cluster, youmust add cluster users and administrators to your cluster domain. This will make it possiblefor others to submit jobs to the cluster, and to perform routine administration and maintenanceon the cluster. If your organization uses Active Directory, you will need to create a trustrelationship between your cluster domain and other domains in your organization. You willalso need to create organizational units (OUs) in your cluster domain that will act ascontainers for other OUs or users from your organization. You may need to work with othergroups in your company to create the necessary security groups so that you can add usersfrom other domains to your compute cluster domain. Because each organization is unique, itis not possible to provide step-by-step instructions on how to add users and administrators tothe cluster domain. For help and information on how best to add users and administrators toyour cluster, see Windows Server Help.
To add users and administrators to your cluster
1. In the Compute Cluster Administrator, click the To Do List. In the results pane, click
Manage Cluster Users and Administrators. The Manage Cluster Users Wizard starts.
Click Next.
2. On the Cluster Users page, the default group of HPCCLUSTER\Domain Users has been
added for you. Type a user or group by using the format domain\user or domain\group,
and then click Add. You can add or remove users and groups using the Add and
Remove buttons. When you have finished adding or removing users and groups, click
Next.
3. On the Cluster Administrators page, the default group of HPCCLUSTER\Domain
Admins has been added for you. Type a user or group using the format domain\user or
8/8/2019 Compute Cluster Deployment Guide
26/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 26
domain\group, and then click Add. You can add or remove users and groups by using the
Add and Remove buttons. When you have finished adding or removing users and
groups, click Next.
4. On the View Summary page, click Next.
5. On the Result page, click Close.
Step 9: Deploy the Client Utilities to Cluster Users
The Compute Cluster Administrator and the Compute Cluster Job Manager are installed onthe head node by default. If you install the client utilities on a remote workstation, anadministrator can manage clusters from that workstation. If you install the Compute ClusterAdministrator or Job Manager on a remote computer, the computer must have one of thefollowing operating systems installed:
Windows Server 2003, Compute Cluster Edition
Windows Server 2003, Standard x64 Edition
Windows Server 2003, Enterprise x64 Edition
Windows XP Professional x64 Edition
Windows Server 2003 R2 Standard x64 Edition
Windows Server 2003 R2 Enterprise x64 Edition
In addition, Windows Compute Cluster Server 2003 requires the following:
Microsoft .NET Framework 2.0
Microsoft Management Console (MMC) 3.0 to run the Compute Cluster Administratorsnap-in
SQL Server 2000 Desktop Engine (MSDE) to store all job information
The last step in the Windows Compute Cluster Server 2003 deployment process is to create
an administrator or operator console.
To deploy the client utilities
1. On the workstation that is running the appropriate operating system, insert theCompute Cluster Pack CD. The Microsoft Compute Cluster Pack InstallationWizard is automatically launched. Click Next.
2. On the Microsoft Software License Terms page, select I accept the terms in thelicense agreement, and click Next.
3. On the Select Installation Type page, select Installonly the Microsoft ComputeCluster Pack Client Utilities for the cluster users and administrators, and thenclick Next.
4. On the Select Installation Location page, accept the default location, and then clickNext.
5. On the Install Required Components page, highlight any components that are notinstalled, and then click Install.
6. When the installation is finished, a window appears that says Microsoft ComputeCluster Pack has been successfully installed.Click Finish.
8/8/2019 Compute Cluster Deployment Guide
27/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 27
Please note that for an administration console, you should install only the client utilities. For adevelopment workstation, you should install both the software development kit (SDK) utilitiesand the client utilities.
8/8/2019 Compute Cluster Deployment Guide
28/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 28
Appendix A: Tuning your Cluster
Each cluster is created with a different goal in mind; therefore, there is a different way to tuneeach cluster for optimal performance. However, some basic guidelines can be established. Toachieve performance improvements, you can do some planning, but testing will also be
crucial. For testing, it is important to use applications and data that are as close as possible tothe ones that the cluster will ultimately use. In addition to the specific use of the cluster, itsprojected size will be another basis for making decisions. After you deploy the applications,you can work on tuning the cluster appropriately.
The best networking solution will depend on the nature of your application. Although there aremany different types of applications, they can be broadly categorized as message-intensiveand embarrassingly parallel. In message-intensive applications, each nodes job is dependenton other nodes. In some situations, data is passed between nodes in many small messages,meaning that latency is the limiting factor. With latency-sensitive applications, high-performance networking interfaces, such as Winsock Direct, are crucial. In addition, the use ofhigh-quality routers and switches can improve performance with these applications.
In some messaging situations, large messages are passed infrequently, meaning that
bandwidth is the limiting factor. A specialty network, such as InfiniBand or Myrinet, will meetthese high-bandwidth requirements. If network latency is not an issue, a gigabit Ethernetnetwork adapter might be the best choice.
In embarrassingly parallel applications, each node processes data independently with littlemessage passing. In this case, the total number of nodes and the efficiency of each node isthe limiting factor. It is important to be able to fit the entire dataset into RAM. This will result inmuch faster performance, as the data will not have to be paged in and out from the diskduring processing. The speed of the processors and the type and number of nodes is a primeconcern. If the processors are dual-core or quad-core, this may not be as efficient as havingseparate processors, each with their own memory bus. In addition, if hyper-threading isavailable, it may be advantageous to turn this feature off. Hyper-threading is used whenapplications are not using all CPU cycles, so we have them run on a single processor.Hyperthreading is generally bad for high-performance computing applications, but not
necessarily all of them. So long as the operating system kernel is hyperthread-aware, thefloating point intensive processes will be balanced across physical cores. For multi-threadedapplications that may have I/O intensive threads and floating point intensive threads,hyperthreading could be a benefit. Hyperthreading was disabled at NCSA because none ofthe applications were floating-point intensive, and no specific thread-balancing or kernel-tuning was performed. This works for regular scenarios, but in high-performance computing,all CPU resources are used, so having all processes on a single processor has the oppositeeffects: they have to wait to get resources. If the application were actually perfectly parallel,each extra node would increase performance time linearly.
For each application, there are a maximum number of processors that will increaseperformance. Above that number, each processor adds no value, and could even decreaseperformance. This is referred to as application scaling. Depending on the system architecture,all cores sometimes divide available bandwidth to memory, and they certainly always dividethe network bandwidth. One of these three (CPU, network, or memory bandwidth) is theperformance bottleneck with any application. If the nature of the application(s) is known, youcan determine in advance the optimal cluster specifications that will match the application.You should work with your application vendor to ensure that you have the optimal number ofprocessors.
In some applications, many jobs are run, each of short duration. With this scenario, theperformance of the job scheduler is crucial. The CCS job scheduler was designed to handlethis situation.
8/8/2019 Compute Cluster Deployment Guide
29/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 29
When evaluating cluster performance, it is important to be aware that benchmarks dontalways tell the whole story. You must evaluate the performance based on your own needsand expectations. Evaluation should take place by using the application along with the datathat will be running on the cluster. This will help to ensure a more accurate evaluation that willresult in a system that better meets your needs.
For more information on cluster tuning, you can download the Performance Tuning a
Compute Cluster white paper from the Microsoft Web site at http://go.microsoft.com/fwlink/?LinkId=87828
You can also find additional tips and new information on performance tuning at the HPCTeam blog: http://windowshpc.net/blogs.
Table 5 deals with scalability and will help you make decisions based on the intended size ofyour cluster. The first part focuses on management scenarios, while the second part focuseson networking scenarios. For each scenario, there are an estimated number of nodes, abovewhich the scenario will manifest itself. If your cluster exceeds the specified number of nodes,you may need to use the Note column to plan accordingly, or to troubleshoot.
Table 5: Scalability Considerations
Management Scenario Nodes NoteMSDE on Head Nodesupports 8 or fewerconcurrent connections.
64+ Use SQL Server 2005 on the head node (hard coded)
5-7 tables for scheduler. Use 8 tables for SDM.
RIS on Service Nodesupports only 80 machinessimultaneously.
64+ Use ADS for CCS 2003 (ADS requires 32-bit).
ICS/NAT has an addressrange limit of 192.168.0.*
250+ Use DHCP Server instead.
The File server on HeadNode only supports a limitednumber of simultaneousconnections to SMB/NTFS.
24 Executable on compute nodes
Increase the number of connections that the file serveron the head node can support (see KB 317249)
The DC/DNS server on HeadNode is not optimal. It doesnthandle well with several NICs.
64+ It is best to leverage corporate IT DC.
Put DC/DNS on a separate management node.
ADS loses contact withcompute nodes afterWinsockDirect has beenenabled.
N/A Use clusrun or jobs to control the machine. If IPMI isavailable, use IPMI to reboot the machine into winPE.
WDS for next version of CCS works withWinsockDirect.
Cisco IB switch subnetmanageris incompatible withopenIB drivers.
N/A Use openSM:
Disable Cisco IB switch subnet manager
Enable openSM
A SDMupdate bottleneckexists.
64+ CCS V1 SP1
Job Scheduling bottlenecksexist.
64+ CCS V1 SP1
WinsockDirect (large scaleonly)
64+ CCS V1 SP1
Winsock Direct hotfixes 910481 , 927620, 924286
Infiniband drivers (largescale only)
64+ This is fixed in openFabrics build 459, found at:
http://windows.openib.org/downloads/binaries/
http://go.microsoft.com/fwlink/?LinkId=87828http://go.microsoft.com/fwlink/?LinkId=87828http://windowshpc.net/blogshttp://windowshpc.net/blogshttp://go.microsoft.com/fwlink/?LinkId=87828http://go.microsoft.com/fwlink/?LinkId=87828http://windowshpc.net/blogs8/8/2019 Compute Cluster Deployment Guide
30/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 30
Management Scenario Nodes Note
There is a bottleneck in thenumber of possiblesimultaneous connectionswith code path used whenSYN attack protection is on.
64+ Disable SYN attack protection registry value
HKLM\System\CurrentControlSet\Services\Tcpip\Parameters
SynAttackProtect =0
There are TCP timeouts oncalling nodes when network is
jammed (delay at switch). Forexample, mpi all reduce.
64+ Set TCP retransmission count to 0x20. Please notethat this is hard to diagnose as one-to-all makesdifferent nodes fail.
Latency is too high. N/A Use mpiexec env IBWSD_POLL 500 linpack.
Bandwidth is too low. N/A Use mpiexec envMPICH_SOCKET_SBUFFER_SIZE 0 to avoid copy onsend to improve bandwidth. Only use this whenWinsock Direct is enabled it can cause lockup withGigE and IPoIB.
A Winsock Direct connectiontimeout exists.
N/A Use mpiexec env IBWSD_SA_TIMEOUT 1000 to setthe subnet manager timeout to a higher value duringWinsock Direct connection establishment.
8/8/2019 Compute Cluster Deployment Guide
31/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 31
Appendix B: Troubleshooting Your Cluster
In addition, Table 6 can help you troubleshoot problems with your cluster.
Table 6: Troubleshooting
Issue Mitigation Details
Application Hangs
N/A For ease of debugging, switch off shared memory communicationusing environment variableMPICH_DISABLE_SHM=1
Note: This is done from the
command line with the command:
mpiexec env VARIABLE SETTING
-env OTHERVARIABLE
OTHERSETTING
Note: You can also set upWinDbg for just-in-time debuggingwith the commandWindbg I
MPI environment variableMPICH_DISABLE_SHM
If you disable the shared memory,MSMPI stops looking at communicationbetween processors and focuses onnetwork queues (node to node)communication.
Application Fails
Networkconnectivity issue
The last line of the MPI output filegives you information on networkerrors
Note: The stdout output is locatedwhere you route it in your job; i.e.,it is specified by the/stdout:switch to job submit
Output file
SYN protectioninterferes withconnectivity underheavy load
Turn SYN protection offcompletely on all CNs (leave SYNprotection active on the headnode to avoid denial-of-service)
Registry setting for SYN attackprotection:
KEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\SynAttackProtect=0
To deploy this setting to all nodes:
clusrun /all reg add
HKLM\SYSTEM\CurrentControlSet\Service
s\Tcpip\Parameters /v
SynAttackProtect /t REG_DWORD /d 0 /f
clusrun /all shutdown -t 10 -r -f
8/8/2019 Compute Cluster Deployment Guide
32/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 32
Issue Mitigation Details
Networkconnectivity failure
Identify node with defect
Use Pallas ping pong, one-to-all,all-to-all
A good set of tools for this are the Linux-based Intel MPI benchmarks (based onthe Pallas test suite). These are availablefor download fromhttp://windowshpc.net/files/4/porting_unix
_code/entry373.aspxNote: Because these tests are Linux-based, you will have to port them to CCSusing the Subsystem for UNIXApplications (SUA). Instructions how todo this are included with the download.
Winsock Directissues
Disable Winsock Direct (WSD)and use the IPoIB path instead ofRDMA:
Clusrun /all
\\HEADNODE\IBDriverInstallPath\n
et\amd64\installsp -r
If it works when disabled, then tryto Repair IB connections clusrunnetsh interface set interfacename=MPIadmin=DISABLE/ENABLE
Validate that your cluster has thelatest Winsock Direct patches
IB driver and Winsock Direct installationutility
Application Performance Not Optimal
Application notoptimized formemory or CPUutilization
Check whether nodes are paginginstead of using RAM
Check CPU utilization
Use perfmon counters
http://go.microsoft.com/fwlink/?LinkId=86619
Application doesnt
scale to largenumber of nodes
Decrease the number of nodes
used by the application untilapplication performance comesback to expected level
MSMPI does notbalance wellbetween samenode processorscommunicationand node to nodecommunication
Experiment with disabling theshared memory setting and seewhether application performanceimproves
Especially relevant for message-intensive applications
MPICH_DISABLE_SHM
Messages are notcoming in fastenough
GigE: Experiment with turning offthe interrupt modulation to free upCPU usage
IB: Experiment with increasing
polling of messages.Polling causes high CPU usage,so if usage is too high, it will bedetrimental to the applicationcomputing CPU needs.
openIB driver IBWSD_POLL
http://windowshpc.net/files/4/porting_unix_code/entry373.aspxhttp://windowshpc.net/files/4/porting_unix_code/entry373.aspxhttp://go.microsoft.com/fwlink/?LinkId=86619http://go.microsoft.com/fwlink/?LinkId=86619http://windowshpc.net/files/4/porting_unix_code/entry373.aspxhttp://windowshpc.net/files/4/porting_unix_code/entry373.aspxhttp://go.microsoft.com/fwlink/?LinkId=86619http://go.microsoft.com/fwlink/?LinkId=866198/8/2019 Compute Cluster Deployment Guide
33/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 33
Issue Mitigation Details
Connectivity toone or more nodeson the cluster islost
Divide the cluster into subsets ofnodes.
Run Pallas ping pong or Pallasone-to-all or Pallas all-to-all onthose subsets.
Intel MPI benchmarks.
This strategy breaks the cluster intosubclusters to try to find where the issueis. In each sublcluster run sanity testslike the Pallas series in order to discover
which subcluster contains the badnode.
Switchesoversubscriptionnot optimal
Try a higher number of uplinks This strategy involves checking thenumber of uplinks/downlinks per switchto check to see this is the cause of poorapplication performance.
Send operation Experiment with having no extracopy on the Send operation
MSMPI setting
Set MPICH_SOCKET_SBUFFER_SIZEto 0
Note: This is done on the command linewith the command:
mpiexec env VARIABLE SETTING-env
OTHERVARIABLE OTHERSETTINGNote: This will lead to higher bandwidthbut also to higher CPU utilization.
Note: Use this only when compute nodesare fitted with a WSD-enabled driver.Using a setting of 0 will cause thecompute nodes on non-WSD networks tostop responding.
Memory busbottleneck
Experiment with setting theprocessor affinity (assign an MPIprocess to a specific CPU or CPUcore)
An example of doing this from thecommand line:
job submit /numprocessors:12
mpiexec /cmd /c setAffinity.bat
myapp.exe
where setAffinity.bat consists of:
@echo off
set /a AFFINITY="1
8/8/2019 Compute Cluster Deployment Guide
34/35
Step-by-Step Guide to Install ing, Configuring, and Tuning a High-Performance Compute cluster 34
Appendix C: Cluster Configuration and Deployment
Scripts
These scripts are used to automatically add nodes to the cluster and to deploy images to
nodes automatically without administrator intervention.
AddADSDevices.vbs. Parses an input file and uses the data to automaticallypopulate ADS with the correct compute node information, including node names andMAC address values. These values are later used by Sysprep.exe to configure thenode images during deployment.http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb01.mspx
AddComputeNodes.csv . Sample input file that shows configuration informationneeded for adding nodes to the cluster. The easiest way to work with this file is toimport it into Excel as a comma-delimited file, add the necessary values, includingcompute node MAC addresses, and then export the data as a comma-separatedvalue file. Every item must have an entry or the input file will not work properly. If youdo not have a value for a field, use a hyphen - for the field instead.
http://www.microsoft.com/technet/scriptcenter/scripts/ccs/node/ccnovb11.mspxCapture-CCS-image-with-winpe.xml . ADS job template that captures a computenode image for later deployment to nodes on the cluster.http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb04.mspx
Deploy-CCS-image-with-winpe.xml . ADS job template that deploys a compute nodeimage to compute nodes on the cluster.http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb02.mspx
Sysprep.inf. Generic configuration file for use with Sysprep.exe. Variable values areretrieved from ADS during the image deployment process.http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb05.mspx
The original high-performance compute cluster used additional scripts specific to itsenvironment, including configuring InfiniBand networking. If you have similar needs, you canuse these examples as a foundation for creating your own scripts and job templates.
ChangeIPforIB.vbs. Original script to configure IP over InfiniBand networking.http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb03.mspx
Capture-image-with-winpe.xml . Original job template to capture compute nodeimage.http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb06.mspx
Deploy-image-on-16GB-with-winpe.xml . Original job template to deploy a computenode image to the compute nodes.http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb07.mspx
http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb01.mspxhttp://www.microsoft.com/technet/scriptcenter/scripts/ccs/node/ccnovb11.mspxhttp://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb04.mspxhttp://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb02.mspxhttp://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb05.mspxhttp://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb03.mspxhttp://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb06.mspxhttp://www.microsoft.com/technet/scriptceTop Related