Post on 12-Jan-2016
Ohio Supercomputer Center
Cluster Computing OverviewCluster Computing Overview
Summer Institute for Advanced ComputingSummer Institute for Advanced ComputingAugust 22, 2000August 22, 2000
Doug Johnson, OSCDoug Johnson, OSC
Cluster Computing Overview
2Ohio Supercomputer Center
OverviewOverview What is Cluster Computing Why Cluster Computing How Clusters Fit with OSC Mission When Did It All Start OSC 128 Processor SGI/Linux Cluster Clusters for Production HPC Environments
Cluster Computing Overview
3Ohio Supercomputer Center
What is Cluster Computing?What is Cluster Computing?
A Cluster is a collection of interconnected whole computers used as a single, unified computer
Cluster Computing is many things... High performance computing
Run programs with parallel algorithms High throughput computing
Parametric studies (same program run many times with different parameters)
High availability computing Fail-over redundancy
Both scientific and commercial applications!Both scientific and commercial applications!
Common Resources
CPU(s)MemoryHard DriveNetwork Card
NETW
ORK
Cluster Computing Overview
4Ohio Supercomputer Center
Brief History of Cluster Computing at Brief History of Cluster Computing at OSCOSC
Beowulf project at Center of Excellence in Space Data and Information Sciences (CESDIS) installs 1st cluster - 16 Intel 486 DX4 processors @ 100MHz, 16 Mbytes RAM per processor, 10 Mbit Ethernet interconnect (3per node)
OSC installs “Beaker” systems, a dual purpose workstation cluster - 12 DEC Alpha EV4 processors with Full Duplex FDDI interconnect
OSC installs “Trout” system, dual purpose workstation cluster, 14 SGI O2 workstations, R10000 processors @ 150 MHz, ATM interconnect
OSC 10 processor IA32 Linux Cluster, Pentium II-400MHz processors,Myrinet interconnect, 4.5 Gbyte RAM
OSC SGI/Linux 128 Processor Cluster, Pentium III Xeon 550MHz processors, 66 Gbyte RAM, Myrinet and 100Mbit Ethernet interconnect
Cluster Computing Overview
5Ohio Supercomputer Center
Why Parallel ComputingWhy Parallel Computing
Parallel computing is a strong presence at the National level and is the future of High Performance Computing(HPC)
Parallel computing platforms are a vital element in our infrastructure
Parallel systems have traditionally not been an accessible resource, compared to single processor systems Higher cost (due mostly to high performance interconnect) Less refined user interface Non-traditional programming techniques with little training available
OSC Mission Statement
OSC provides a reliable high performance computing and communications infrastructure for a diverse, statewide/regional community including education, academic research, industry, and state government.
…
Cluster Computing Overview
6Ohio Supercomputer Center
Why Cluster ComputingWhy Cluster Computing
OSC evaluates new and emerging information technologies Cluster computing is one of the hottest fields in high performance
computing
Potential benefits of clusters over traditional parallel systems High performance interconnect technology is approaching commodity
availability Performance of commodity systems are increasing at an aggressive
rate due to the commercial market of home/office workstations
OSC Mission Statement
...
In collaboration with this community, OSC evaluates, implements, and supports new and emerging information technologies.
...
Cluster Computing Overview
7Ohio Supercomputer Center
Why Cluster ComputingWhy Cluster ComputingPotential benefits of clusters over traditional parallel systems (cont)
Operating system gives users the same environment on their desk that they have on the parallel system
Other differences System administration implications
No single system image - OS and software upgrades must be applied to all nodes
Cluster design lends itself to more frequent hardware upgrades Performance implications Accounting/funding implications
Cluster Computing Overview
8Ohio Supercomputer Center
How Clusters Fit With OSC MissionHow Clusters Fit With OSC Mission
OSC evaluates new and emerging information technologies Multiple software packages have been evaluated to provide the most robust system Four different network interconnects have been installed to evaluate performance Three different processors and operating systems were investigated
OSC implements new and emerging information technologies A cluster under OSC administration has been available to users since March, 1999 OSC Partnered with Portland Group to bring Cluster Development Kit to OSC users
OSC supports new and emerging information technologies OSC 128 processor cluster in production status Training classes on how to build and use a cluster Staff available to Ohio faculty to help answer questions and trouble shoot problems
OSC Mission Statement
...
In collaboration with this community, OSC evaluates, implements, and supports new and emerging information technologies.
...
Cluster Computing Overview
9Ohio Supercomputer Center
To SummarizeTo Summarize
Develop cluster technology so that it can be rolled out to university research labs Provide a hardware and software configuration that will allow labs
to construct a working cluster with minimal effort Experienced OSC staff can provide technical assistance
Evaluate software and hardware configurations to assist researchers in defining a system that will best suit their needs Let the researchers focus on science Based on user applications, provide performance analysis showing
the optimal hardware and software configuration
OSC wants to encourage parallel programming Parallel programming is the future of high performance computing Clusters provide increased access to parallel systems
Cluster Computing Overview
10Ohio Supercomputer Center
When Did It All Start?When Did It All Start?
December, 1998OSC management authorizes a dedicated 10 processor cluster for technology evaluation
1 - Front end node1 - Front end node
2 Intel Pentium II 400MHz processors
512 Mbyte RAM, 18 Gbyte Disk
4 - Compute nodes4 - Compute nodes
2 Intel Pentium II 400MHz processors
1 Gbyte RAM, 9 Gbyte disk
Interconnects: 100mbit Ethernet, Dolphinics SCI, Myricom Myrinet
Linux OS, PBS Batch System, PGI Linux OS, PBS Batch System, PGI Compiler SuiteCompiler Suite
April, 1999Performance evaluation yields promising results and machine is opened to users in April, 1999
Cluster Computing Overview
11Ohio Supercomputer Center
OSC/SGI ClusterOSC/SGI Cluster
September, 1999Agreement signed between OSC and SGI
October, 1999System powered on
November, 1999Machine configured and running applications on floor of Supercomputing 99
December, 1999Machine installed at OSC
February, 2000Machine opened to friendly users
Cluster Computing Overview
12Ohio Supercomputer Center
HardwareHardware
1 front-end node configured with: Two Gigabytes of RAM Four 550 MHz Intel Pentium III Xeon processors, each with 512kB of secondary cache 48 Gigabytes, ultra-wide SCSI hard drives Two 100Base-T Ethernet interfaces One HIPPI interface
32 compute nodes each configured with: Two Gigabytes of RAM Four 550 MHz Intel Pentium III Xeon processors, each with 512kB of secondary cache 18 Gigabytes, ultra-wide SCSI hard drives Two Myrinet interfaces One 100Base-T Ethernet interface
All nodes are SGI 1400L servers
Cluster Computing Overview
13Ohio Supercomputer Center
Software and ConfigurationSoftware and Configuration Hardware originally assembled in Mountainview, CA
by SGI Professional Services OS and software environment installed and
configured by OSC staff Linux operating system Portable Batch System
(PBS) Portland Group
Compiler Suite Myrinet MPICH-GM
interface
Cluster Computing Overview
14Ohio Supercomputer Center
Clusters for Production HPC EnvironmentClusters for Production HPC Environment
There are two significant efforts with building clusters Building a cluster and making it operational Making the cluster a production system
Ability to host multiple users simultaneously Ability to schedule system resources Ability to function without constant intervention
The OSC cluster has the following attributes that make it a true HPC production system Connection to a Mass Storage System (MSS) Integrated into OSC account database system Job accounting Good utilization High availability
Cluster Computing Overview
15Ohio Supercomputer Center
Mass Storage SupportMass Storage Support
HIPPIHIPPI
100 Mbit (private)100 Mbit (private)100 Mbit Switch100 Mbit Switch
DMFDMF
Origin 2000Origin 2000
1 Terabyte disk storageData Migration Facility
(DMF)
IBM 3494IBM 3494
30 Terabyte tape storage
. . . .
Cluster Computing Overview
16Ohio Supercomputer Center
User Accounts and AccountingUser Accounts and Accounting User Accounts
Cluster is integrated into the Center’s database system for automatic account generation and maintenance
Job Accounting Accounting has been configured into the environment which tracks
CPU usage of users CPU usage is converted with a charging algorithm and deducted
from a Principal Investigators account Users can view accounting history with text command from Linux
command prompt
Cluster Computing Overview
17Ohio Supercomputer Center
Utilization and AvailabilityUtilization and Availability Utilization
System utilization is recorded and accessible via a web link
For parallel systems, utilization is expected to be around 50 to 70%
Current utilization is about 70% parallel and 30% serial
Availability Good availability has been achieved through significant uptime and
minimal system problems Scheduling downtime every 4 weeks for software upgrades,
hardware modifications and general system maintenance
Cluster Computing Overview
18Ohio Supercomputer Center
TCP Stream PerformanceTCP Stream Performance
TCP Stream Performance
0
50
100
150
200
250
300
350
0 1000000 2000000 3000000 4000000 5000000 6000000 7000000 8000000 9000000
Block Size (bytes)
Meg
abits
/Sec
ond
HIPPI
Gigabit Ethernet
Fast Ethernet
Cluster Computing Overview
19Ohio Supercomputer Center
TCP Stream PerformanceTCP Stream Performance
TCP Stream Performance
0
50
100
150
200
250
300
0 50000 100000 150000 200000 250000
Block Size (bytes)
Meg
abits
/sec
ond
HIPPI
Gigabit Ethernet
Fast Ethernet
Cluster Computing Overview
20Ohio Supercomputer Center
UDP Stream PerformanceUDP Stream Performance
./netperf -l 60 -H fe.ovl.osc.edu -i 10,2 -I 99,10 -t UDP_STREAM -- -m 1472 -s 32768 -S 32768
UDP UNIDIRECTIONAL SEND TEST to fe.ovl.osc.edu : +/-5.0% @ 99% conf.
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
131070 1472 59.99 3229909 0 634.03
524288 59.99 2169706 425.91