Beowulf ClustersMatthew Doney
What is a cluster?
A cluster is a group of several computers connected
Several different methods of connecting them
Distributed
Computers widely separated, connected over the internet
Used by research groups like SETI@home and GIMPS
Workstation Cluster
Collection of Workstations loosely connected by LAN
Cluster Farm
PC’s connected over LAN that perform work when idle
What is a Beowulf Cluster
A Beowulf Cluster is one class of a cluster computer
Uses Commercial Off The Shelf (COTS) hardware
Typically contains both master and slave nodes
Not defined by a specific piece of hardware
Image Source: http://www.cse.mtu.edu/Common/cluster.jpg
What is a Beowulf Cluster
The origin of the name “Beowulf”
Main character of Old English poem
Described in the poem – “he has thirty men’s heft of grasp in the gripe of his hand, the bold-in-battle”.
Image Source:http://www.teachingcollegeenglish.com/wp-content/uploads/2011/06/lynd-ward-17-jnanam-dot-net.jpg
Cluster Computer History – 1950’s
SAGE, one of the first cluster computers
Developed by IBM for NORAD
Linked radar stations together for first early warning detection system
Image Source: http://www.ieeeghn.org/wiki/images/3/34/Sage_nomination.jpg
Cluster Computer History – 1970’s
Technological Advancements
VLSI (Very Large Scale Integration)
Ethernet
UNIX Operating System
Cluster Computer History – 1980’s
Increased interest in cluster computing
Ex: NSA connected 160 Apollo workstations in a cluster configuration
First widely used clustering product: VAXcluster
Development of task scheduling software
Condor package developed by UW-Madison
Development of parallel programming software
PVM(Parallel Virtual Machine)
Cluster Computer History – 1990’s
NOW(Network of workstations) project at UC Berkeley
First cluster on TOP500 list
Development of Myrinet LAN system
Beowulf project started at NASA’s Goddard Space Flight Center
Image Source: http://www.cs.berkeley.edu/~pattrsn/Arch/NOW2.jpg
Cluster Computer History - Beowulf
Developed by Thomas Sterling and Donald Becker
16 Individual nodes
100 MHz Intel 80486 processors
16 MB memory, 500 MB hard drive
2 10Mbps Ethernet ports
Early version of Linux
Used PVM library
Cluster Computer History – 1990’s
MPI standard developed
Created to be a global standard to replace existing message passing protocols
DOE, NASA, California Institute of Technology collaboration
Developed a Beowulf system with sustained performance 1 Gflops
Cost $50,000
Awarded Gordon Bell prize for price/performance
28 Clusters were on the TOP500 list by the end of the decade
Beowulf Cluster Advantages
Price/Performance
Using COTS hardware greatly reduces associated costs
Scalability
By using individual nodes, more can easily be added by slightly altering the network
Convergence Architecture
Using commodity hardware has standardized operating systems, instruction sets, and communication protocols
Code portability has greatly increased
Beowulf Cluster Advantages
Flexibility of Configuration and Upgrades
Large variety of COTS components
Standardization of COTS components allows for easy upgrades
Technology Tracking
Can use new components as soon as they come out
No delay time waiting for manufacturers to integrate components
High Availability
System will continue to run if an individual node fails
Beowulf Cluster Advantages
Level of Control
System is easily configured to users liking
Development Cost and Time
No special hardware needs to be designed
Less time designing system, just pick parts to be used
Cheaper mass market components
Beowulf Cluster Disadvantages
Programming Difficulty
Programs need to be highly parallelized to take advantage of hardware design
Distributed Memory
Program data is split over the individual nodes
Network speed can bottleneck performance
Results may need to be compiled by a single node
Beowulf Cluster Architecture
Master-Slave configuration
Master Node
Job scheduling
System monitoring
Resource management
Slave Node
Does assigned work
Communicates with other slave nodes
Sends results to master node
Node Hardware Typically desktop PC’s
Can consist of other types of computers i.e.
Rack-mount servers
Case-less motherboards
PS3’s
RaspberryPi boards
Node Software
Operating System
Resource Manager
Message Passing Software
Resource Management Software
Condor
Developed by UW-Madison
Allows distributed job submission
PBS (Portable Batch System)
Initially developed by NASA
Developed to schedule jobs on parallel compute clusters
Maui
Adds enhanced monitoring to existing job scheduler (i.e. PBS)
Allows administrator to set individual and group job priorities
Sample Condor Submit File
Submits 150 copies of the program foo
Each copy of the program has its own input, output, and error message file
All of the log information from Condor goes to one file
Sample Maui Configuration File
User yangq will have the highest priority users of the group ART having lowest
Members of group CS_SE are limited to 20 jobs which use no more than 100 nodes
Sample PBS Submit File
Submits job “my_job_name” that needs 1 hour and 4 CPUs with 2GB of memory
Uses file “my_job_name.in” as input
Uses file “my_job_name.log” as output
Uses file “my_job_name.err” as error output
Message Passing Software
MPI (Message Passing Interface)
Widely used in HPC community
Specification is controlled by MPI-Forum
Available for free
PVM (Parallel Virtual Machine)
First message passing protocol in be widely used
Provided for fault tolerant operation
MPI Hello World Example
MPI Hello World Example(cont)
PVM Hello World Example
PVM Hello World Example
Interconnection Hardware
Two main choices – technology and topology
Main Technologies
Ethernet with speeds up to 10Gbps
Infiniband with speeds up to 300 Gbps
Image Source:http://www.sierra-cables.com/Cables/Images/12X-Infiniband-R.jpg
Interconnection Topology
Torus Network
Bus Network
Flat Neighborhood Network
References
[1] Impagliazzo, J., & Lee, J. A. N. (2004). History of Computing in Education. Norwell: Kluwer Academic Publishers.
[2] Pfeiffer, C. (Photographer). (2006, November 25). Cray-1 Deutsches Museum [Web Photo]. Retrieved from http://en.wikipedia.org/wiki/File:Cray-1-deutsches-museum.jpg
[3] Sterling, T. (2002). Beowulf Cluster Computing with Linux. United States of America: Massahusetts Institue of Technology.
[4] Sterling, T. (2002). Beowulf Cluster Computing with Windows. United State of America: Massachusetts Institute of Technology.
[5] Condor High Throughput Computing. (2013, October 24). Retrieved October 27, 2013, from http://research.cs.wisc.edu/htcondor/
References
[6] Beowulf: A Parallel Workstation For Scientific Computation. (1995). Retrieved October 27, 2013, from http://www.phy.duke.edu/~rgb/brahma/Resources/beowulf/papers/ICPP95/ icpp95.html
[7] Development over Time | TOP500 Supercomputer Sites. Retrieved October 27, 2013, from www.top500.org/statistics/overtime/
[8] Jain, A. (2006). Beowulf cluster design and setup. Retrieved October 27, 2013. Informally published manuscript, Department of Computer Science, Boise State University, Retrieved from http://cs.boisestate.edu/~amit/research/beowulf/beowulf-setup.pdf
[9] Zinner, S. (2012). High Performance Computing Using Beowulf Clusters. Retrieved October 27, 2013. Retrieved from http://www2.hawaii.edu/~zinner/101/students/MitchelBeowulf/cluster.html
Questions???
Top Related