Parallex - The Supercomputer
-
Upload
ankit-singh -
Category
Technology
-
view
1.278 -
download
1
description
Transcript of Parallex - The Supercomputer
TThhee
SSuuppeerr CCoommppuutteerr
PPAARRAALLLLEEXX –– TTHHEE SSUUPPEERR CCOOMMPPUUTTEERR
A PROJECT REPORT
Submitted by
Mr. AMIT KUMAR
Mr. ANKIT SINGH
Mr. SUSHANT BHADKAMKAR
in partial fulfillment for the award of the degree
Of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE
GUIDE: MR. ANIL KADAM
AISSMS’S COLLEGE OF ENGINEERING, PUNE
UNIVERSITY OF PUNE
2007 - 2008
CERTIFICATE
Certified that this project report “Parallex - The Super Computer” is
the bonafide work of
Mr. AMIT KUMAR (Seat No. :: B3*****7)
Mr. ANKIT SINGH (Seat No. :: B3*****8)
Mr. SUSHANT BHADKAMKAR (Seat No. :: B3*****2)
who carried out the project work under my supervision.
Prof. M. A. Pradhan Prof. Anil Kadam
HEAD OF DEPARTMENT GUIDE
Acknowledgment
The success of any project is never limited to an individual undertaking
the project. It is the collective effort of people around the individual that
spell success. There are some key personalities involved whose role has
been very vital to pave way for the success of the project. We take the
opportunity to express our sincere thanks and gratitude to them.
We would like to thank all the faculties (teaching & non-teaching) of
Computer Engineering Department of AISSMS College of Engineering,
Pune. Our project guide Prof. Anil Kadam was very generous in his
time and knowledge with us. We are grateful to Mr. Shasikant
Athavale who was the source of constant motivation and inspiration for
us. We are very thankful and obliged by the valuable suggestions
constantly given by Prof. Nitin Talhar and Ms. Sonali Nalamwar
which proved to be very helpful for the success of our project. Our
deepest gratitude to Prof. M. A. Pradhan for her thoughtful comments
accompanied with her gentle support during the academics.
We would like to thank the college authorities for providing us with full
support regarding lab, network and related software.
Abstract
Parallex is a parallel processing cluster consisting of control nodes and
execution nodes. Our implementation removes all the requirements of kernel level
modification and kernel patches to run a Beowulf cluster system. There can be many
control nodes in a typical Parallex cluster. The many control nodes will no longer just
monitor but will also take part in execution if resources permit. We have removed all
the restrictions of kernel, architecture and platform dependencies making out cluster
system work with completely different sets of CPU powers, operating systems, and
architectures, that too without the use of any existing parallel libraries, such as MPI
and PVM.
With a radically new perspective of how parallel system is supposed to be, we
have implemented our own distribution algorithms and parallel algorithms aimed at
ease of administration and simplicity of usage, without compromising the efficiency.
With a fully modular 7-step design we attack the traditional complications and
deficiencies in existing parallel system, such as redundancy, scheduling, cluster
accounting and parallel monitoring.
A typical Parallex cluster may consist of a few old-386 running NetBSD,
some ultra modern Intel – Dual Core running Linux, and some server class MIPS
processor running IRIX, all working in parallel with full homogeneity.
Table of Contents
Chapter No. Title Page No.
LIST OF FIGURES I
LIST OF TABLES II
1. A General Introduction
1.1 Basic concepts 1
1.2 Promises and Challenges 5
1.2.1 Processing technology 6
1.2.2 Networking technology 6
1.2.3 Software tools and technology 7
1.3 Current scenario 8
1.3.1 End user perspectives 8
1.3.2 Industrial perspective 8
1.3.3 Developers, researchers & scientists perspective 9
1.4 Obstacles and Why we don’t have 10 GHz today 9
1.5 Myths and Realities: 2 x 3 GHz < 6GHz 10
1.6 The problem statement 11
1.7 About PARALLEX 11
1.8 Motivation 12
1.9 Feature of PARALLEX 13
1.10 Why our design is “alternative” to parallel system 13
1.11 Innovation 14
2. REQURIREMENT ANALYSIS 16
2.1 Determining the overall mission of Parallex 16
2.2 Functional requirement for Parallex system 16
2.3 Non-functional requirement for system 17
3. PROJECT PLAN 19
4. SYSTEM DESIGN 21
5. IMPLEMENTATION DETAIL 24
5.1 Hardware architecture 24
5.2 Software architecture 26
5.3 Description for software behavior 28
5.3.1 Events 32
5.3.2 States 32
6. TECNOLOGIES USED 33
6.1 General terms 33
7. TESTING 35
8. COST ESTIMATION 44
9. USER MANUAL 45
9.1 Dedicated cluster setup 45
9.1.1 BProc Configuration 45
9.1.2 Bringing up BProc 47
9.1.3 Build phase 2 image 48
9.1.4 Loading phase 2 image 48
9.1.5 Using the cluster 49
9.1.6 Managing the cluster 50
9.1.7 Troubleshooting techniques 51
9.2 Share cluster setup 52
9.2.1 DHCP 52
9.2.2 NFS 54
9.2.2.1 Running NFS 55
9.2.3 SSH 57
9.2.3.1 Using SSH 60
9.2.4 Host file and name service 65
9.3 Working with PARALLEX 65
10. CONCLUSION 67
11. FUTURE ENHANCEMENT 68
12. REFERENCE 69
APPENDIX A 70 – 77
APPENDIX B 78 – 88
GLOSSARY 89 – 92
MEMORABLE JOURNEY (PHOTOS) 93 – 95
PARALLEX ACHIEVEMENTS 96 - 97
I. LIST OF FIGURES:
1.1 High-performance distributed system.
1.2 Transistor vs. Clock Speed
4.1 Design Framework
4.2 Parallex Design
5.1 Parallel System H/W Architecture
5.2 Parallel System S/W Architecture
7.1 Cyclomatic Diagram for the system
7.2 System Usage pattern
7.3 Histogram
7.4 One frame from Complex Rendering on Parallex: Simulation of an
explosion
II. LIST OF TABLES:
1.1 Project Plan
7.1 Logic/ coverage/decidion Testing
7.2 Functional Test
7.3 Console Test cases
7.4 Black box Testing
7.5 Benchmark Results
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 1 -
Chapter 1. A General Introduction
1.1 BASIC CONCEPTS
The last two decades spawned a revolution in the world of computing; a move away
from central mainframe-based computing to network-based computing. Today,
servers are fast achieving the levels of CPU performance, memory capacity, and I/O
bandwidth once available only in mainframes, at cost orders of magnitude below that
of a mainframe. Servers are being used to solve computationally intensive problems
in science and engineering that once belonged exclusively to the domain of
supercomputers. A distributed computing system is the system architecture that makes
a collection of heterogeneous computers, workstations, or servers act and behave as a
single computing system. In such a computing environment, users can uniformly
access and name local or remote resources, and run processes from anywhere in the
system, without being aware of which computers their processes are running on.
Distributed computing systems have been studied extensively by researchers, and a
great many claims and benefits have been made for using such systems. In fact, it is
hard to rule out any desirable feature of a computing system that has not been claimed
to be offered by a distributed system [24]. However, the current advances in
processing and networking technology and software tools make it feasible to achieve
the following advantages:
• Increased performance. The existence of multiple computers in a distributed system
allows applications to be processed in parallel and thus improves application and
system performance. For example, the performance of a file system can be improved
by replicating its functions over several computers; the file replication allows several
applications to access that file system in parallel. Furthermore, file replication
distributes network traffic associated with file access across the various sites and thus
reduces network contention and queuing delays.
• Sharing of resources. Distributed systems are cost-effective and enable efficient
access to all system resources. Users can share special purpose and sometimes
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 2 -
expensive hardware and software resources such as database servers, compute servers,
virtual reality servers, multimedia information servers, and printer servers, to name
just a few.
• Increased extendibility. Distributed systems can be designed to be modular and
adaptive so that for certain computations, the system will configure itself to include a
large number of computers and resources, while in other instances, it will just consist
of a few resources. Furthermore, limitations in file system capacity and computing
power can be overcome by adding more computers and file servers to the system
incrementally.
• Increased reliability, availability, and fault tolerance. The existence of multiple
computing and storage resources in a system makes it attractive and cost-effective to
introduce fault tolerance to distributed systems. The system can tolerate the failure in
one computer by allocating its tasks to another available computer. Furthermore, by
replicating system functions and/or resources, the system can tolerate one or more
component failures.
• Cost-effectiveness. The performance of computers has been approximately doubling
every two years, while their cost has decreased by half every year during the last
decade. Furthermore, the emerging high speed network technology [e.g., wave-
division multiplexing, asynchronous transfer mode (ATM)] will make the
development of distributed systems attractive in terms of the price/performance ratio
compared to that of parallel computers. These advantages cannot be achieved easily
because designing a general purpose distributed computing system is several orders of
magnitude more difficult than designing centralized computing systems—designing a
reliable general-purpose distributed system involves a large number of options and
decisions, such as the physical system configuration, communication network and
computing platform characteristics, task scheduling and resource allocation policies
and mechanisms, consistency control, concurrency control, and security, to name just
a few. The difficulties can be attributed to many factors related to the lack of maturity
in the distributed computing field, the asynchronous and independent behavior of the
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 3 -
systems, and the geographic dispersion of the system resources. These are
summarized in the following points:
• There is a lack of a proper understanding of distributed computing theory—the field
is relatively new and we need to design and experiment with a large number of
general-purpose reliable distributed systems with different architectures before we can
master the theory of designing such computing systems. One interesting explanation
for the lack of understanding of the design process of distributed systems was given
by Mullender. Mullender compared the design of a distributed system to the design of
a reliable national railway system that took a century and half to be fully understood
and mature. Similarly, distributed systems (which have been around for
approximately two decades) need to evolve into several generations of different
design architectures before their designs, structures, and programming techniques can
be fully understood and mature.
• The asynchronous and independent behavior of the system resources and/or
(hardware and software) components complicate the control software that aims at
making them operate as one centralized computing system. If the computers are
structured in a master–slave relationship, the control software is easier to develop and
system behavior is more predictable. However, this structure is in conflict with the
distributed system property that requires computers to operate independently and
asynchronously.
• The use of a communication network to interconnect the computers introduces
another level of complexity. Distributed system designers not only have to master the
design of the computing systems and system software and services, but also have to
master the design of reliable communication networks, how to achieve
synchronization and consistency, and how to handle faults in a system composed of
geographically dispersed heterogeneous computers. The number of resources
involved in a system can vary from a few to hundreds, thousands, or even hundreds of
thousands of computing and storage resources.
Despite these difficulties, there has been limited success in designing special-purpose
distributed systems such as banking systems, online transaction systems, and point-of-
sale systems. However, the design of a general purpose reliable distributed system
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 4 -
that has the advantages of both centralized systems (accessibility, management, and
coherence) and networked systems (sharing, growth, cost, and autonomy) is still a
challenging task. Kleinrock makes an interesting analogy between the human-made
computing systems and the brain. He points out that the brain is organized and
structured very differently from our present computing machines. Nature has been
extremely successful in implementing distributed systems that are far more intelligent
and impressive than any computing machines humans have yet devised. We have
succeeded in manufacturing highly complex devices capable of high speed
computation and massive accurate memory, but we have not gained sufficient
understanding of distributed systems; our systems are still highly constrained and
rigid in their construction and behavior. The gap between natural and man-made
systems is huge, and more research is required to bridge this gap and to design better
distributed systems. In the next section we present a design framework to better
understand the architectural design issues involved in developing and implementing
high performance distributed computing systems. A high-performance distributed
system (HPDS) (Figure 1.1) includes a wide range of computing resources, such as
workstations, PCs, minicomputers, mainframes, supercomputers, and other special-
purpose hardware units. The underlying network interconnecting the system resources
can span LANs, MANs, and even WANs, can have different topologies (e.g., bus,
ring, full connectivity, random interconnect), and can support a wide range of
communication protocols.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 5 -
Fig. 1.1 High-performance distributed system.
1.2 PROMISES AND CHALLENGES OF PARALLEL AND
DISTRIBUTED SYSTEMS
The proliferation of high-performance systems and the emergence of high speed
networks (terabit networks) have attracted a lot of interest in parallel and distributed
computing. The driving forces toward this end will be
(1) The advances in processing technology,
(2) The availability of high-speed network, and
(3) The increasing research efforts directed toward the development of software
support and programming environments for distributed computing.
Further, with the increasing requirements for computing power and the diversity in
the computing requirements, it is apparent that no single computing platform will
meet all these requirements. Consequently, future computing environments need to
capitalize on and effectively utilize the existing heterogeneous computing resources.
Only parallel and distributed systems provide the potential of achieving such an
integration of resources and technologies in a feasible manner while retaining desired
usability and flexibility. Realization of this potential, however, requires advances on a
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 6 -
number of fronts: processing technology, network technology, and software tools and
environments.
1.2.1 Processing Technology
Distributed computing relies to a large extent on the processing power of the
individual nodes of the network. Microprocessor performance has been growing at a
rate of 35 to 70 percent during the last decade, and this trend shows no indication of
slowing down in the current decade. The enormous power of the future generations of
microprocessors, however, cannot be utilized without corresponding improvements in
memory and I/O systems. Research in main-memory technologies, high-performance
disk arrays, and high-speed I/O channels are, therefore, critical to utilize efficiently
the advances in processing technology and the development of cost-effective high
performance distributed computing.
1.2.2 Networking Technology
The performance of distributed algorithms depends to a large extent on the bandwidth
and latency of communication among work nodes. Achieving high bandwidth and
low latency involves not only fast hardware, but also efficient communication
protocols that minimize the software overhead. Developments in high-speed networks
provide gigabit bandwidths over local area networks as well as wide area networks at
moderate cost, thus increasing the geographical scope of high-performance distributed
systems.
The problem of providing the required communication bandwidth for distributed
computational algorithms is now relatively easy to solve given the mature state of
fiber-optic and optoelectronic device technologies. Achieving the low latencies
necessary, however, remains a challenge. Reducing latency requires progress on a
number of fronts. First, current communication protocols do not scale well to a high-
speed environment. To keep latencies low, it is desirable to execute the entire protocol
stack, up to the transport layer, in hardware. Second, the communication interface of
the operating system must be streamlined to allow direct transfer of data from the
network interface to the memory space of the application program. Finally, the speed
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 7 -
of light (approximately 5 microseconds per kilometer) poses the ultimate limit to
latency. In general, achieving low latency requires a two-pronged approach:
1. Latency reduction. Minimize protocol-processing overhead by using streamlined
protocols executed in hardware and by improving the network interface of the
operating system.
2. Latency hiding. Modify the computational algorithm to hide latency by pipelining
communication and computation. These problems are now perhaps most fundamental
to the success of parallel and distributed computing, a fact that is increasingly being
recognized by the research community.
1.2.3 Software Tools and Environments
The development of parallel and distributed applications is a nontrivial process and
requires a thorough understanding of the application and the architecture. Although a
parallel and distributed system provides the user with enormous computing power and
a great deal of flexibility, this flexibility implies increased degrees of freedom which
have to be optimized in order to fully exploit the benefits of the distributed system.
For example, during software development, the developer is required to select the
optimal hardware configuration for the particular application, the best decomposition
of the problem on the hardware configuration selected, and the best communication
and synchronization strategy to be used, and so on. The set of reasonable alternatives
that have to be evaluated in such an environment is very large, and selecting the best
alternative among these is a nontrivial task. Consequently, there is a need for a set of
simple and portable software development tools that can assist the developer in
appropriately distributing the application computations to make efficient use of the
underlying computing resources. Such a set of tools should span the software life
cycle and must support the developer during each stage of application development,
starting from the specification and design formulation stages, through the
programming, mapping, distribution, scheduling phases, tuning, and debugging
stages, up to the evaluation and maintenance stages.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 8 -
1.3 Current Scenario
The current scenario of the Parallel Systems can be viewed under three
perspectives. A common concept that applies to all of the following is the idea of
Total Ownership Cost (TOC). By far TOC is a common scale on which level of
computer processing is assessed worldwide. TOC is defined by the ratio of Total Cost
of Implementation and maintenance by the net throughput the parallel cluster delivers.
TOTAL COST OF IMPLEMENTATION AND MAINTENANCE
TOC = ------------------------------------------------------------------------------------
NETSYSTEM THROUGHPUT (IN FLOATING POINT / SEC)
1.3.1 End user perspectives
Various activities such as rendering, adobe Photoshop applications and
different processes come under this category. As there is increase in need of
processing power day by day it thereby increases hardware cost. From the end user
prospective the Parallel Systems aims to reduce the expenses and avoid the
complexities. At this stage we are trying to implement a Parallel System which is
more cost effective and user friendly. However, as the end user, TOC is less important
in most cases because Parallel Clusters could rarely be owned by a single user, and in
that case the net throughput of the Parallel System becomes the most crucial factor.
1.3.2 Industrial Perspective
In Corporate Sectors Parallel Systems are extensively implemented. Such a
Parallel Systems consist of machines that have to handle millions of nodes
theoretically not practically. From the industrial point of view the Parallel System
aims at resource isolation, replacing large scale dedicated commodity hardware and
Mainframes. Corporate sectors often place TOC as the primary criteria at which a
Parallel Cluster is judged. With increase in scalability, the cost of owing Parallel
Clusters shoot up to unmanageable heights and our primary aim is this area is to bring
down the TOC as much as possible.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 9 -
1.3.3 Developers, Researchers & Scientists Perspective
Scientific applications such as 3D simulations, high scale scientific rendering,
intense numerical calculations, complex programming logic, and large scale
implementation of algorithms (BLAS and FFT Libraries) require levels of processing
and calculation that no modern day dedicated vector CPU could possibly meet.
Consequently, the Parallel Systems are proven to be the only and the most efficient
alternative in order to keep pace with modern day scientific advancements and
research. TOC is rarely a matter of concern here.
1.4 Obstacles and Why we don’t have 10 GHz today…
Fig 1.2 Transistor vs. Clock Speed
CPU performance growth as we have known it hit a wall
Figure graphs the history of Intel chip introductions by clock speed and number of
transistors. The number of transistors continues to climb, at least for now. Clock
speed, however, is a different story.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 10 -
Around the beginning of 2003, you’ll note a disturbing sharp turn in the previous
trend toward ever-faster CPU clock speeds. We have added lines to show the limit
trends in maximum clock speed; instead of continuing on the previous path, as
indicated by the thin dotted line, there is a sharp flattening. It has become harder and
harder to exploit higher clock speeds due to not just one but several physical issues,
notably heat (too much of it and too hard to dissipate), power consumption (too high),
and current leakage problems.
Sure, Intel has samples of their chips running at even higher speeds in the
lab—but only by heroic efforts, such as attaching hideously impractical quantities of
cooling equipment. You won’t have that kind of cooling hardware in your office any
day soon, let alone on your lap while computing on the plane.
1.5 Myths and Realities: 2 x 3GHz < 6 GHz
So a dual-core CPU that combines two 3GHz cores practically offers 6GHz of
processing power. Right?
Wrong. Even having two threads running on two physical processors doesn’t
mean getting two times the performance. Similarly, most multi-threaded applications
won’t run twice as fast on a dual-core box. They should run faster than on a single-
core CPU; the performance gain just isn’t linear, that’s all.
Why not? First, there is coordination overhead between the cores to ensure
cache coherency (a consistent view of cache, and of main memory) and to perform
other handshaking. Today, a two- or four-processor machine isn’t really two or four
times as fast as a single CPU even for multi-threaded applications. The problem
remains essentially the same even when the CPUs in question sit on the same die.
Second, unless the two cores are running different processes, or different
threads of a single process that are well-written to run independently and almost never
wait for each other, they won’t be well utilized. (Despite this, we will speculate that
today’s single-threaded applications as actually used in the field could actually see a
performance boost for most users by going to a dual-core chip, not because the extra
core is actually doing anything useful, but because it is running the ad ware and spy
ware that infest many users’ systems and are otherwise slowing down the single CPU
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 11 -
that user has today. We leave it up to you to decide whether adding a CPU to run your
spy ware is the best solution to that problem.)
If you’re running a single-threaded application, then the application can only
make use of one core. There should be some speedup as the operating system and the
application can run on separate cores, but typically the OS isn’t going to be maxing
out the CPU anyway so one of the cores will be mostly idle. (Again, the spy ware can
share the OS’s core most of the time.)
1.6 The problem statement
So now let us summarize and define the problem statement:
• Since the growth of requirements of processing is far greater than the growth
of CPU power, and since the silicon chip is fast approaching its full capacity,
the implementation of parallel processing at every level of computing becomes
inevitable.
• There is a need to have a single and complete clustering solution which
requires minimum user interference but at the same time supports
editing/modifications to suit the user’s requirements.
• There should be no need to modify the existing applications.
• The parallel system must be able to support different platforms
• The system should be able to fully utilize all the available hardware resources
without the need of buying any extra/special kind of hardware.
1.7 About PARALLEX
While the term parallel is often used to describe clusters, they are more
correctly described as a type of distributed computing. Typically, the term parallel
computing refers to tightly coupled sets of computation. Distributed computing is
usually used to describe computing that spans multiple machines or multiple
locations. When several pieces of data are being processed simultaneously in the same
CPU, this might be called a parallel computation, but would never be described as a
distributed computation. Multiple CPUs within a single enclosure might be used for
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 12 -
parallel computing, but would not be an example of distributed computing. When
talking about systems of computers, the term parallel usually implies a homogenous
collection of computers, while distributed computing typically implies a more
heterogeneous collection. Computations that are done asynchronously are more likely
to be called distributed than parallel. Clearly, the terms parallel and distributed lie at
either end of a continuum of possible meanings. In any given instance, the exact
meanings depend upon the context. The distinction is more one of connotations than
of clearly established usage.
Parallex is both a parallel and distributed cluster because it supports both ideas
of multiple CPUs within a single enclosure as well as a heterogeneous collection
of computers.
1.8 Motivation
The motivation behind this project is to provide a cheap and easy to use
solution to cater to the high performance computing requirements of organizations
without the need to install any expensive hardware.
In many organizations including our college, we have observed that when old
systems are replaced by newer ones the older ones are generally dumped or sold at
throw away prices. We also wanted to find a solution to effectively use this “silicon
waste”. These wasted resources can be easily added to our system as the processing
need increases, because the parallel system is linearly scalable and hardware
independent. Thus the intent is to have an environment friendly and effective
solution that utilizes all the available CPU power to execute applications faster.
1.9 Features of Parallex
• Parallex simplifies the cluster setup, configuration and management process.
• It supports machines with hard disks as well as diskless machines running at
the same time.
• It is flexible in design and easily adaptable.
• Parallex does not require any special kind of hardware.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 13 -
• It is multi platform compatible.
• It ensures efficient utilization of silicon waste (old unused hardware).
• Parallex is scalable.
How these features are achieved and details of design will be discussed in subsequent
chapters.
1.10 Why our design is “Alternative” to parallel system?
Every renowned technology needs to evolve after a particular time as new
generation enhances the sort come of the technology used earlier. So what we
achieved is a bare bone line semantic of parallel system.
When we were studying about the parallel and distributed system, the
advantage is that we were working on the latest technology. The parallel system
designed by scientist, no doubt were far more genius and intelligent than us. Our
system is unique because we are actually splitting up the task according to processing
power of nodes instead of just load balancing. Hence a slow processing node will get
a smaller task compared to a faster one and all nodes will show the output the same
calculated time on master node.
We found some difficulties that how much task should be given to the
heterogeneous system in order to get result at same time. We worked on this problem
to find the solution and developed mathematical distribution algorithm which was
successfully implemented and functional. This algorithm breaks the task according to
the speed of the CPUs by sending a test application to all nodes and storing the return
time of each node into a file. Then we further worked on the automation of the entire
system. We were using password less secure shell login and network file system. We
were successful up to some extent but atomization was not possible to ssh and NFS
configuration. Hence manually setting up of new nodes every time is a demerit of ssh
and NFS. To overcome this demerit we sorted the alternative solution which is
Beowulf cluster, but after studying we concluded that it considered all nodes of same
configuration and send tasks equally to all nodes.
To improve our system we think differently from Beowulf cluster. We tried to
make system more cost effective. We thought of diskless cluster concept in order get
reed of hard disk to cut the cost and enhance the reliability of machine. The storage
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 14 -
device will affect the performance of entire system and increase the cost (due to
replacement of the disks) and increase the waste of time in searching the faults. So,
we studied & patched the Beowulf server & Beowulf distributed process space
according to our need for our system. We made a kernel images for running diskless
clusters using RARP protocol. When clusters runs kernel image in its memory, it
demands for IP from master node or can also be called as server. The server assigns
IP & node number of the clusters. By this, our diskless clusters system stands & ready
to use for parallel computing. Then we modified our various codes including our own
distribution algorithm, according to our new design. The best part of our system was
that there is no need for any authorization setup. Every thing is now automatic.
Till now, we were working on CODE LEVEL PARALLELISM. In this, we
little bit modify code to run on our system just like MPI libraries are used to make
code parallely executable. Now, the challenge with us was that what if we didn’t get
source code instead of which we will get binary file to execute it on our parallel
system. So, now we need to enhance our system by adding BINARY LEVEL
PARALLELISM. We studied Open Mosix. Once open Mosix is installed & all the
nodes are booted, the Open Mosix nodes see each other in the cluster and start
exchanging information about their load level and resource usage. Once the load
increases beyond the defined level, the process migrates to any other nodes on the
network. There might be a situation where process demands heavy resource usage, it
may happen that the process may keep migrating from node to node without been
serviced. This is the major design flaw of the Open Mosix. And we are working out to
find the solution.
So, Our Design is ALTERNATIVE to all problems in the world of parallel
computing.
1.11 Innovation
Firstly our system does not require any additional hardware if the existing
machines are well connected in a network. Secondly, even in a heterogeneous
environment, with few fast CPUs and a few slower ones, the efficiency of the system
does not drop by more than 1 to 5%, still maintaining an efficiency of around 80% for
suitably adapted applications. This is because the mathematical distribution algorithm
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 15 -
considers relative processing powers of the node distributing only the amount of load
that a node can process in the calculated optimal time of the system. All the nodes
will process respective tasks and produce output at this calculated time. The most
important point about our system is the ability to use diskless nodes in cluster, thereby
reducing hardware costs and space and the required maintenance. Also in case of
binary executables (when source code is not available) our system exhibits almost
20% performance gains.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 16 -
Chapter 2. Requirement Analysis
2.1 Determining the overall mission of Parallex
• User base: Students, educational institutes, small to medium business
organizations.
• Cluster usage: There will be one part of the cluster fully dedicated to solve the
problem at hand and an optional part where computing resources from
individual workstations are used. In the latter part, the parallel problems will
be having lower priorities.
• Software to be run on cluster: Depends upon the user base. At the cluster
management level, the system software will be Linux.
• Dedicated or shared cluster: As mentioned above it will be both.
• Extent of the cluster: Computers that are all on the same subnet
2.2 Functional Requirements for Parallex system
Functional Requirement 1
The PC’s must be connected in LAN so as to enable the system to be use without any
obstacles.
Functional Requirement 2
There will one master or controlling node which will distribute the task according to
the processing speed of the node.
Services
Three services are to be provided on the master.
1. There is a Network Monitoring tool for resource discovery (e.g. IP address,
MAC addresses, UP/DOWN Status etc.)
2. The Distribution Algorithm will distribute the task according to the current
processing speed of the nodes.
3. Parallex Master Script that will send the distributed task to the nodes and get
back the result and integrate it and gives out the output.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 17 -
Functional Requirement 3
The final size of the executable code so be such that it should reside in the limited
memory constraints on the machine.
Functional Requirement 4
This product will only be used to speed up the applications which are preexisting in
the enterprise.
2.3 Non-Functional Requirements for system
- Performance
Even in a heterogeneous environment, with few fast CPUs and a few slower ones, the
efficiency of the system does not drop by more than 1 to 5%, still maintaining an
efficiency of around 80% for suitably adapted applications. This is because the
mathematical distribution algorithm considers relative processing powers of the node
distributing only the amount of load that a node can process in the calculated optimal
time of the system. All the nodes will process respective tasks and produce output at
this calculated time. The most important point about our system is the ability to use
diskless nodes in cluster, thereby reducing hardware costs and space and the required
maintenance. Also in case of binary executables (when source code is not available)
our system exhibits almost 20% performance gains.
- Cost
While a system of n parallel processors is less efficient than one n times faster
processor, the Parallel System is often cheaper to build. Parallel computation is used
for tasks which require very large amounts of computation, take a lot of time, and can
be divided into n independent subtasks. In recent years, most high performance
computing systems, also known as supercomputers, have parallel architectures.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 18 -
- Manufacturing costs
No extra hardware required. Cost of setting up LAN.
- Benchmarks
There are at least three reasons for running benchmarks. First, a benchmark will
provide us with a baseline. If we make changes to our cluster or if we suspect
problems with our cluster, we can rerun the benchmark to see if performance is really
any different. Second, benchmarks are useful when comparing systems or cluster
configurations. They can provide a reasonable basis for selecting between
alternatives. Finally, benchmarks can be helpful with planning.
For benchmarking we will use a 3D rendering tool named Povray (Persistence Of
Vision Ray tracer, please see the Appendix for more details).
- Hardware required
x686 Class PCs (Linux (2.6x Kernels) installed with intranet connection)
Switch (100/10T)
Serial port connectors
100 BASE T LAN cable, RJ 45 connectors.
- Software Resources Required
Linux (2.6.x kernel)
Intel Compiler suite (Noncommercial)
LSB (Linux Standard Base) Set of GNU Kits with GNU CC/C++/F77/LD/AS
GNU Krell monitor
Number of PC’s connected in LAN
8 NODES in the LAN.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 19 -
Chapter 3. Project Plan
Plan of execution for the project was as follows:
Serial
No.
Activity Software
Used
Number Of
Days
1 Project Planning
a) Choosing domain
b) Identifying Key areas of
work
c) Requirement analysis
- 10
2 Basic Installation of LINUX.
LINUX (2.6x
Kernel)
3
3 Brushing up on C programming Skills
- 5
4 Shell Scripting
LINUX (2.6x
Kernel), GNU
BASH
12
5 C Programming in LINUX Environment
GNU C
Compiler
Suite
5
6 A Demo Project (Universal Sudoku
Solver)
To familiarize with LINUX
programming environment.
GNU C
Compiler
Suite , INTEL
Compiler suite
(Non-
commercial)
16
7 Study Advanced LINUX tools and
Installation of Packages & RED HAT
RPMs.
Iptraf, mc, tar,
rpm, awk, sed,
GNU plot,
strace, gdb, etc.
10
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 20 -
8 Studying Networking Basics & Network
configuration in LINUX.
- 8
9 Recompiling, Patching and
analyzing the system kernel
LINUX (Kernel
2.6x.x), GNU c
compiler
3
10 Study & implementation of Advanced
Networking Tools : SSH & NFS
ssh & Openssh,
nfs
7
11 a) Preparing the preliminary design of
the total workflow of the project.
b) Deciding the modules for overall
execution, and dividing the areas of the
concentration among the project group.
c) Build Stage I prototype
All of the above 17
12 Build Stage II prototype
(Replacing ssh by custom made
application)
All of the above 15
13 Build Stage III prototype
(Making Diskless Cluster)
All of the above 10
14 Testing & Building Final Packages
All of the above 10
Table 1.1 Project Plan
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 21 -
Chapter 4. System Design
Generally speaking, the design process of a distributed system involves three main
activities:
(1) designing the communication system that enables the distributed system resources
and objects to exchange information,
(2) defining the system structure (architecture) and the system services that enable
multiple computers to act as a system rather than as a collection of computers, and
(3) defining the distributed computing programming techniques to develop parallel
and distributed applications.
Based on this notion of the design process, the distributed system design framework
can be described in terms of three layers:
(1) network, protocol, and interface (NPI) layer,
(2) system architecture and services (SAS) layer, and
(3) distributed computing paradigms (DCP) layer. In what follows, we describe the
main design issues to be addressed in each layer.
Fig. 4.1 Design Framework
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 22 -
• Communication network, protocol, and interface layer. This layer describes the
main components of the communication system that will be used for passing control
and information among the distributed system resources. This layer is decomposed
into three sub layers: network type, communication protocols, and network interfaces.
• Distributed system architecture and services layer. This layer represents the
designer’s and system manager’s view of the system. SAS layer defines the structure
and architecture and the system services (distributed file system, concurrency control,
redundancy management, load sharing and balancing, security service, etc.) that must
be supported by the distributed system in order to provide a single-image computing
System.
• Distributed computing paradigms layer. This layer represents the programmer
(user) perception of the distributed system. This layer focuses on the programming
paradigms that can be used to develop distributed applications. Distributed computing
paradigms can be broadly characterized based on the computation and communication
models. Parallel and distributed computations can be described in terms of two
paradigms: functional parallel and data parallel paradigms. In functional parallel
paradigm, the computations are divided into distinct functions which are then
assigned to different computers. In data parallel paradigm, all the computers run the
same program, the same program multiple data (SPMD) stream, but each computer
operates on different data streams.
With reference to Fig. 4.1, Parallex can be described as follows:
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 23 -
Fig. 4.2 Parallex Design
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 24 -
Chapter 5. Implementation Details
The goal of the project is to provide an efficient system that will handle process
parallelism with the help of Clusters. This parallelism will thereby reduce the time of
execution. Currently we form a cluster of 8 nodes. Using a single computer for
execution of any heavy process takes lot of time in execution. So here we are forming
a cluster and executing those processes in parallel by dividing the process into number
of sub processes. Depending on the nodes in cluster we migrate the process to those
node and when the execution is over then it brings back the output produced by them
to the Master node. By doing this we are reducing the process execution time and
increasing the CPU utilization.
5.1 Hardware Architecture
We have implemented a Shared Nothing Architecture of parallel system by
making use of Coarse Grain Cluster structure. The inter-connect is ordinary 8-port
switch and an optionally a Class-B or Class-C network. It is 3 level architecture:
1. Master topology
2. Slave Topology
3. Network interconnect
1. Master is a Linux running machine with a 2.6.x or 2.4.x (both under testing)
kernel. It runs the parallel-server and contains the application interface to drive the
remaining machines. The master runs a network scanning script to detect all the slaves
that are alive and retrieves all the necessary information about each slave. To
determine the load on each slave just before the processing of the main application,
the master sends a small diagnostic application to the slave to estimate the load it can
take at the present moment. Having collected all the relevant information, it does all
the scheduling, implementing of parallel algorithms (distributing the tasks according
processing power and current load), making use of CPU extensions (MMX, SSE,
3DNOW) depending upon the slave architecture, and everything except the execution
of the program itself. It accepts the input/task to be executed. It allocates the tasks to
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 25 -
underlying slave nodes constituting the parallel system, which execute the tasks in
parallel and return the output to the Master. Master plays the role of watchdog, which
may or may not participate in actual processing But manages the entire task.
2. Slave is a single system cluster image (SSCI). It is basically dedicated for
processing purpose. It accepts the sub-task along with the necessary library modules
executes them and returns the output back to the Master. In our case, the slaves would
be multi-boot capable systems, which could at one point of time be diskless cluster
hosts, at other time they might behave as a general purpose cluster node and at some
other time, they could act as normal CPU handling routine tasks of office and homes.
In case of Diskless Machines, the slave will boot on Pre-created kernel image patched
appropriately.
3. Network interconnection is to merge both Master and Slave topologies. It makes
use of an 8-port switch, RJ 45 connectors and serial CAT 5 cables. It is a Star
topology where the Master and the Slaves are interconnected through the Switch.
Fig. 5.1 Parallel System H/W Architecture
Cluster Monitoring : Each slave runs a server that collects the kernel processing / IO
/ memory / CPU and all the related details from PROC VIRTUAL file system and
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 26 -
forwards it to the MASTER NODE (here acting as a slave to each server running on
each slave), and a user base programs plots it interactively on the Server screen thus
showing the CPU / MEMORY / IO details of each node separately.
5.2 SOFTWARE ARCHITECTURE:-
This architecture consists of two parts i.e.
1. Master Architecture
2. Slave Architecture
Master consists of following levels.
1. Linux BIOS: Linux BIOS usually loads a Linux kernel.
2. Linux: Platform on which Master runs.
3. SSCI + Beoboot: This level extracts a single system cluster image used by
Slave nodes.
4. Fedora Core/ Red Hat: Actual Operating System running on Master.
5. System Services: Essential Services running on Master. Eg. RARP Resolver
Daemon.
Slave inherits the Master with the following levels.
1. Linux BIOS
2. Linux
3. SSCI
Fig 5.2 Parallel System S/W Architecture
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 27 -
Parallex is broadly divided in to following Modules:
1. Scheduler: this is the heart of out system. With radically new approach
towards data and instruction level distribution, we have implemented a
completely optimal heterogeneous cluster technology. We do task allocation
based on the actual processing capability on each node and not on the give
GHz power on the manual of the system. The task allocation is dynamic and
the scheduling policy is based on POSIX scheduling implementation. We are
also capable of implementing preemption, which we right now do not do in
favour of the fact that system such as Linux and FreeBSD are capable of
industry level preemption.
2. Job/instruction alligator: this is a set of remote fork like utility that allocates
the jobs to then nodes. Unlike traditional cluster technology, this job allocator
is capable of doing execution in disconnected mode that means that the
network latency would substantially reduce due to temporary disconnection.
3. Accounting: we have written a utility “remote cluster monitor” which is
capable of providing us samples of results from all the nodes, information
about the CPU load, temperature, and memory statistics. We propose that with
less than 0.2% of CPU power consumption, our network monitoring utility can
sample over 1000 nodes in less than 3 seconds.
4. Authentication: all transactions between the nodes are 128 bit encrypted and
do not require root privileges to run. Just a common user on all the standalone
node must exist. For the diskless part, we remove this restriction as well.
5. Resource discovery: we run our own socket layered resource discovery
utility, which discovers any additional nodes. Also reports if the resource has
been lost. In case of any additional hardware capable of being used as part of
parallel system, such as an additional processor to a system, or a replacement
of processor with dual core processor is also reported continually.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 28 -
6. Synchronizer: the central balancing of the cluster. Since the cluster is capable
of simultaneously running both the diskless, and standalone nodes as part of
the same cluster, the synchronizer makes the result more reasonable in output
is queued in real time so that data is not mixed up. It does instruction
dependency analysis, and also uses pipelines in the network to make
interconnect more communicative.
5.3 Description for software behavior
The end user will submit the process/application to the administrator in case
the application is source based, and the Cluster administrator owns the responsibility
to explicitly parallelize the application for maximum exploitation of parallel
architectures within the CPU and across the cluster nodes. In case the application is
binary ( non source), the user might himself/herself submit the code to Master node
program acceptor, which in turn would run the application with somewhat lower
efficiency as compared to the source submissions to the administrator. Now the total
system is responsible for minimizing the time of processing which in turn increases
the throughput and speed up the processing.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 29 -
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 30 -
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 31 -
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 32 -
5.3.1 Events
1. System Installation
2. Network initialization
3. Server and host configuration
4. Take input
5. Parallel execution
6. Send response
5.3.2 States
1. System Ready
2. System Busy
3. System Idle
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 33 -
Chapter 6. Technologies Used
6.1 General terms
We will now briefly define the general terms that will be used in further descriptions
or are related to our system.
Cluster: - Interconnection of large number of computers working together in close
synchronized manner to achieve higher performance, scalability and net
computational power.
Master: - Server machine which acts as the administrator of the entire parallel Cluster
and executes task scheduling.
Slave: - A client node which executes the task as given by the Master.
SSCI: - Single System Cluster Image is a hypothetical idea of implementing cluster
nodes into an image, where the cluster nodes will behave as if it were an additional
processor; add on ram etc. into the controlling Master computer. This is the base
theory of cluster level parallelism. Example implementations are, Multi node NUMA
(IBM/Sequent) Multi-quad computers, SGI ATIX Servers. However, the idea of true
SSCI remains unimplemented when it comes to heterogeneous clusters for parallel
processing, except for Supercomputing clusters such as Thunder and Earth Stimulator.
RARP: - Reverse Address Resolution
Protocol is a network layer protocol used to resolve an IP address from a
given hardware address (such as an Ethernet address / MAC Address).
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 34 -
BProc:-
The Beowulf Distributed Process Space (BProc) is set of kernel modifications,
utilities and libraries which allow a user to start processes on other machines in a
Beowulf-style cluster. Remote processes started with this mechanism appear in the
process table of the front end machine in a cluster. This allows remote process
management using the normal UNIX process control facilities. Signals are
transparently forwarded to remote processes and exit status is received using the usual
wait() mechanisms.
Having discussed the basic concepts of parallel and distributed systems, the problems
in this field, and an overview of Parallex, we now move forward with the requirement
analysis and design details of our system.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 35 -
Chapter 7. Testing
Logic Coverage/Decision Based: Test cases
SI
No
.
Test case name Test
Procedure
Pre-
condition
Expected
Result
Reference
to Detailed
Design
1. Initial_frame_fail Initial frame
not defined
None Parallex
should
give error
& exit
Distribution
algo
2. Final_frame_fail Final frame not
defined
None Parallex
should
give error
& exit
Distribution
algo
3. Initial_final_full Initial & Final
frame given
None Parallex
should
distribute
accordingt
to speed.
Distribution
Algo.
4. Input_file_name_
blank
No input file
given
None Input file
not found
Parallex
Master
5. Input_parameters
_blank
No parameters
defined at
command line
None Exit on
error
Parallex
Master
Table 7.1 Logic/ coverage/decidion Testing
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 36 -
Initial Functional Test Cases for Parallex
Use Case Function Being
Tested
Initia l System
State Input Expected Output
System
Startup
Master is started
when the switch
is turned "on"
Master is off Activate the
"on" switch Master ON
System
Startup
Nodes is started
when the switch
is turned “on”
Nodes is ON Activate the
"on" switch NODES is ON
System
Startup
Nodes assigned
IP by master Booting
Get boot Image
from Master
Master shows that
nodes are UP
System
Shutdown
System is shut
down when the
switch is turned
"off"
System is on and
not servicing a
customer
Activate the
"off" switch System is off
System
Shutdown
Connection to the
Master is
terminated when
the system is shut
down
System has just
been shut down
Verify from the
Master side that a
connection to the
Slave no longer
exists
Session
System reads a
customer's
Program
System is on and
not servicing a
customer
Insert a readable
Code/Program Program accepted
Session
System rejects an
unreadable
Program
System is on and
not servicing a
customer
Insert an
unreadable
Code/ program
Program is
rejected; System
displays an error
screen; System is
ready to start a new
sesion
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 37 -
Use Case Function Being
Tested
Initia l System
State Input Expected Output
System
Startup
Master is started
when the switch
is turned "on"
Master is off Activate the
"on" switch Master ON
Session
System accepts
customer's
Program
System is asking
for entry of
RANGE of
calculation
Enter a RANGE System gets the
RANGE
Session System breaks
the task
System is
breaking task
according to
processing speed
of Nodes.
Perform
distribution
Algo
System breaks task
& write into a file.
Session
System feeds the
task to Nodes for
processing
System feeds
tasks to the
nodes for
execution
Send tasks
System displays a
menu of task
running on Nodes
Session
Session ends
when all nodes
gives out output
System is
getting output of
all nodes &
display the
output & ends
Get the output
from all nodes.
System displays
the output & quit.
Table 7.2 Functional Test
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 38 -
Cyclomatic Complexity:
Control Flow Graph of a System:
Fig 7.1 Cyclomatic Diagram for the system
Cyclomatic complexity is a software metric (measurement) in computational
complexity theory. It was developed by Thomas McCabe and is used to measure the
complexity of a program. It directly measures the number of linearly independent
paths through a program's source code.
Computation of Cyclomatic Complexity:
In the above flow graph
E = no. of edges = 9
N = no. of nodes = 7
M = E – N + 2
= 9 – 7 + 2
= 4
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 39 -
Console And Black Box Testing:
CONSOLE TEST CASES
Sr.
No. Test Procedure Pre - Condition Expected Result Actual Result
1 Testing in Linux
terminal
Terminal
variables have
default values
Xterm related tools
are disabled
No graphical
information
displayed
2 Invalid no. of
arguments All nodes are up Error message Proper Usage given
3
Pop-up terminals
for different
nodes
All nodes are up
No of pop-ups =
no. of cores in alive
nodes
No of pop-ups = no.
of cores in alive
nodes
4 3D Rendering on
single machine
All necessary files
in place Live 3D rendering
Shows frame being
rendered
5 3D Rendering on
Parallex system. All nodes are up Status of rendering Rendered video
6 Mplayer testing Rendered frames Animation in .avi
format
Rendered
video(.avi)
Table 7.3 Console Test cases
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 40 -
BLACK BOX TEST CASES
Sr.
No. Test Procedure Pre - Condition Expected Result Actual Result
1 New Node up Node is Down
Status Message
Displayed By
NetMon Tool.
Message Node UP
2 Node goes Down Nodes is UP
Status Message
Displayed By
NetMon Tool
Message Node
DOWN
3 Nodes
Information Nodes are UP
Internal Information
of Nodes
Status, IP , MAC
addr, RAM etc.
4 Main task
submission
Application is
Compiled
Next module called
(distribution algo)
Processing speed
of the nodes.
5
Main task
submission with
faulty input.
Application is
Compiled ERROR
Display error &
EXIT
6 Distribution
algorithm Get RANGE
Break task according
processing speed of
the nodes
Breaks The
RANGE &
generates scripts
7 Cluster feed script All nodes up
Task sent to
individual machines
for execution
Display shows
task executed on
each machine
8 Result assembly All machines have
returned results
Final result
calculation
Final result
displayed on
screen
9 Fault tolerance
Machine(s) goes
down in-between
execution
Error recovery script
is executed
Task resent to all
alive machines
Table 7.4 Black box Testing
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 41 -
System Usage Specification outline:
Fig 7.2 System Usage pattern :
Fig 7.3 Histogram:
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 42 -
Runtime BENCHMARK:
Runtime Benchmark :
Fig 7.4 One frame from Complex Rendering on Parallex: Simulation of an explosion
The following is the output comparison of same application with same
parameters being run on a Standalone Machine, Existing Beowulf Parallel Cluster,
and Our Cluster System Parallex.
Application: POVRAY
Hardware Specifications:
NODE 0 P4 2.8 GHz
NODE 1 Cor2DUO 2.8 GHz
NODE 2 AMD 64, 2.01 GHz
NODE 3 AMD 64, 1.80 GHz
NODE 4 CELERON D,2.16 GHz
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 43 -
Benchmark Results:
Time Single
Machine
Existing
Parallel
Systems(4
NODES)
Parallex
Cluster
System (4
NODES)
Real Time 14m 44.3 s 3m 41.61 s 3m 1.62 s
User Time 13m 33.2s 10m 4.67 s 9m 30.75 s
Sys Time 2m 2.26s 0m 2.26 s 0m 2.31s
Table 7.5 Benchmark Results
Note : User Time of Cluster is approximate sum of all per user system time per node.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 44 -
Chapter 8. Cost Estimation
Since the growth of requirements of processing is far greater than the growth
of CPU power, and since the silicon chip is fast approaching its full capacity, the
implementation of parallel processing at every level of computing becomes inevitable.
Therefore we propose that in coming ages parallel processing and the
algorithms that sophisticate it, like the ones we have designed and implemented,
would form the heart of modern computing. Not surprisingly, parallel processing has
already begun to penetrate the modern computing marker directly in form of multi
core processors such is Intel dual-core and quad-core processors.
One of ours primary aims are simplistic implementation and least
administrative overhead makes the implementation of Parallex simple and effective.
Parallex can be easily deployed to all sectors of modern computing where CPU intensive applications form an important part for its growth.
While a system of n parallel processors is less efficient than one n times faster
processor, the Parallel System is often cheaper to build. Parallel computation is used
for tasks which require very large amounts of computation, take a lot of time, and can
be divided into n independent subtasks. In recent years, most high performance
computing systems, also known as supercomputers, have parallel architectures.
Cost effectiveness is one of the major achievements of our Parallex system.
We need no external or expensive hardware nor software, so price of our system is not
been expensive. Our system is based on heterogeneous clusters in which power of
CPU is not an issue due to our mathematical distribution algorithm. Our system
efficiency will not drop by more than 5% due to fewer slower machines.
So, we can say that we are using Silicon waste as challenge to our system,
where we use out dated slower CPUs. Hence our system is Environment friendly
design. One more feature of our system is that we are using diskless nodes which will
reduce the total cost of system by approx. 20% as we are not using the storage devices
of nodes. Apart from separate storage device we will use a centralized storage
solution. Last but not the least our all software tools are Open source.
Hence, we conclude that our Parallex system is one of the most cost effective
systems in its genre.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 45 -
Chapter 9. User Manual
9.1 Dedicated cluster setup
For the dedicated cluster with one master and many diskless slaves, all the user has to
do is install the RPMs supplied in the installation disk on the master. The BProc
configuration file will then be found at /etc/bproc/config.
9.1.1 BProc Configuration
Main configuration file:
/etc/bproc/config
• Edit with favorite text editor
• Lines consist of comments (starting with #)
• Rest are keyword followed by arguments
• Specify interface:
interface eth0 10.0.4.1 255.255.255.0
• eth0 is interface connected to nodes
• IP of master node is 10.0.4.1
• Netmask of master node is 255.255.255.0
• Interface will be configured when BProc is started
Specify range of IP addresses for nodes:
iprange 0 10.0.4.10 10.0.4.14
• Start assigning IP addresses at node 0
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 46 -
• First address is 10.0.4.10, last is 10.0.4.14
• The size of this range determines the number of nodes in the cluster
• Next entries are default libraries to be installed on nodes
• Can explicitly specify libraries or extract library information from an
executable
• Need to add entry to install extra libraries
librariesfrombinary /bin/ls /usr/bin/gdb
• The bplib command can be used to see libraries that will be loaded
Next line specifies the name of the phase 2 image
bootfile /var/bproc/boot.img
• Should be no need to change this
• Need to add a line to specify kernel command line
• kernelcommandline apm=off console=ttyS0,19200
• Turn APM support off (since these nodes don’t have any)
• Set console to use ttyS0 and speed to 19200
• This is used by beoboot command when building phase 2 image
Final lines specify Ethernet addresses of nodes, examples given
#node 0 00:50:56:00:00:00
#node 00:50:56:00:00:01
• Needed so node can learn its IP address from master
• First 0 is optional, assign this address to node 0
• Can automatically determine and add ethernet addresses using the
nodeadd command
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 47 -
• We will use this command later, so no need to change now
• Save file and exit from editor
Other configuration files
/etc/bproc/config.boot
• Specifies PCI devices that are going to be used by the nodes at boot time
• Modules are included in phase 1 and phase 2 boot images
• By default the node will try all network interfaces it can find
/etc/bproc/node_up.conf
• Specifies actions to be taken in order to bring a node up
• Load modules
• Configure network interfaces
• Probe for PCI devices
• Copy files and special devices out to node
9.1.2 Bringing up BProc
Check BProc will be started at boot time
# chkconfig --list clustermatic
• Restart master daemon and boot server
# service bjs stop
# service clustermatic restart
# service bjs start
• Load the new configuration
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 48 -
• BJS uses BProc, so needs to be stopped first
• Check interface has been configured correctly
# ifconfig eth0
• Should have IP address we specified in config file
9.1.3 Build a Phase 2 Image
• Run the beoboot command on the master
# beoboot -2 -n --plugin mon
• -2 this is a phase 2 image
• -n image will boot over network
• --plugin add plugin to the boot image
• The following warning messages can be safely ignored
WARNING: Didn’t find a kernel module called gmac.o
WARNING: Didn’t find a kernel module called bmac.o
• Check phase 2 image is available
# ls -l /var/clustermatic/boot.img
9.1.4 Loading the Phase 2 Image
• Two Kernel Monte is a piece of software which will load a new
Linux kernel replacing one that is already running
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 49 -
• This allows you to use Linux as your boot loader!
• Using Linux means you can use any network that Linux supports.
• There is no PXE bios or Etherboot support for Myrinet, Quadrics or Infiniband
• “Pink” network boots on Myrinet which allowed us to avoid buying a 1024
port ethernet network
• Currently supports x86 (including AMD64) and Alpha
9.1.5 Using the Cluster
bpsh
• Migrates a process to one or more nodes
• Process is started on front-end, but is immediately migrated onto nodes
• Effect similar to rsh command, but no login is performed and no shell is
started
• I/O forwarding can be controlled
• Output can be prefixed with node number
• Run date command on all nodes which are up
# bpsh -a -p date
• See other arguments that are available
# bpsh -h
bpcp
• Copies files to a node
• Files can come from master node, or other nodes
• Note that a node only has a ram disk by default
• Copy /etc/hosts from master to /tmp/hosts on node 0
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 50 -
# bpcp /etc/hosts 0:/tmp/hosts
# bpsh 0 cat /tmp/hosts
9.1.6 Managing the Cluster
bpstat
• Shows status of nodes
• up node is up and available
• down node is down or can’t be contacted by master
• boot node is coming up (running node_up)
• error an error occurred while the node was booting
• Shows owner and group of node
• Combined with permissions, determines who can start jobs on the node
• Shows permissions of the node
---x------ execute permission for node owner
------x--- execute permission for users in node group
---------x execute permission for other users
bpctl
• Control a nodes status
• Reboot node 1 (takes about a minute)
# bpctl -S 1 –R
• Set state of node 0
# bpctl -S 0 -s groovy
• Only up, down, boot and error have special meaning, everything else
means not down
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 51 -
• Set owner of node 0
# bpctl -S 0 -u nobody
• Set permissions of node 0 so anyone can execute a job
# bpctl -S 0 -m 111
bplib
• Manage libraries that are loaded on a node
• List libraries to be loaded
# bplib –l
• Add a library to the list
# bplib -a /lib/libcrypt.so.1
• Remove a library from the list
# bplib -d /lib/libcrypt.so.1
9.1.7 Troubleshooting techniques
• The tcpdump command can be used to check for node activity during and after a
node has booted
• Connect a cable to serial port on node to check console output for errors in boot
process
• Once node reaches node_up processing, messages will be logged in
/var/log/bproc/node.N (where N is node number)
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 52 -
9.2 Shared Cluster Setup
Once you have the basic installation completed, you'll need to configure the system.
Many of the tasks are no different for machines in a cluster than for any other system.
For other tasks, being part of a cluster impacts what needs to be done. The following
subsections describe the issues associated with several services that require special
considerations.
9.2.1 DHCP
Dynamic Host Configuration Protocol (DHCP) is used to supply network
configuration parameters, including IP addresses, host names, and other information
to clients as they boot. With clusters, the head node is often configured as a DHCP
server and the compute nodes as DHCP clients. There are two reasons to do this. First,
it simplifies the installation of compute nodes since the information DHCP can supply
is often the only thing that is different among the nodes. Since a DHCP server can
handle these differences, the node installation can be standardized and automated. A
second advantage of DHCP is that it is much easier to change the configuration of the
network. You simply change the configuration file on the DHCP server, restart the
server, and reboot each of the compute nodes.
The basic installation is rarely a problem. The DHCP system can be installed as a part
of the initial Linux installation or after Linux has been installed. The DHCP server
configuration file, typically /etc/dhcpd.conf, controls the information distributed to
the clients. If you are going to have problems, the configuration file is the most likely
source.
The DHCP configuration file may be created or changed automatically when some
cluster software is installed. Occasionally, the changes may not be done optimally or
even correctly so you should have at least a reading knowledge of DHCP
configuration files. Here is a heavily commented sample configuration file that
illustrates the basics. (Lines starting with "#" are comments.)
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 53 -
# A sample DHCP configuration file.
# The first commands in this file are global,
# i.e., they apply to all clients.
# Only answer requests from known machines,
# i.e., machines whose hardware addresses are given.
deny unknown-clients;
# Set the subnet mask, broadcast address, and router address.
option subnet-mask 255.255.255.0;
option broadcast-address 172.16.1.255;
option routers 172.16.1.254;
# This section defines individual cluster nodes.
# Each subnet in the network has its own section.
subnet 172.16.1.0 netmask 255.255.255.0 {
group {
# The first host, identified by the given MAC address,
# will be named node1.cluster.int, will be given the
# IP address 172.16.1.1, and will use the default router
# 172.16.1.254 (the head node in this case).
host node1{
hardware ethernet 00:08:c7:07:68:48;
fixed-address 172.16.1.1;
option routers 172.16.1.254;
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 54 -
option domain-name "cluster.int";
}
host node2{
hardware ethernet 00:08:c7:07:c1:73;
fixed-address 172.16.1.2;
option routers 172.16.1.254;
option domain-name "cluster.int";
}
# Additional node definitions go here.
}
}
# For servers with multiple interfaces, this entry says to ignore requests
# on specified subnets.
subnet 10.0.32.0 netmask 255.255.248.0 { not authoritative; }
As shown in this example, you should include a subnet section for each subnet on
your network. If the head node has an interface for the cluster and a second interface
connected to the Internet or your organization's network, the configuration file will
have a group for each interface or subnet. Since the head node should answer DHCP
requests for the cluster but not for the organization, DHCP should be configured so
that it will respond only to DHCP requests from the compute nodes.
9.2.2 NFS
A network filesystem is a filesystem that physically resides on one computer (the file
server), which in turn shares its files over the network with other computers on the
network (the clients). The best-known and most common network filesystem is
Network File System (NFS). In setting up a cluster, designate one computer as your
NFS server. This is often the head node for the cluster, but there is no reason it has to
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 55 -
be. In fact, under some circumstances, you may get slightly better performance if you
use different machines for the NFS server and head node. Since the server is where
your user files will reside, make sure you have enough storage. This machine is a
likely candidate for a second disk drive or raid array and a fast I/O subsystem. You
may even what to consider mirroring the filesystem using a small high-availability
cluster.
Why use an NFS? It should come as no surprise that for parallel programming you'll
need a copy of the compiled code or executable on each machine on which it will run.
You could, of course, copy the executable over to the individual machines, but this
quickly becomes tiresome. A shared filesystem solves this problem. Another
advantage to an NFS is that all the files you will be working on will be on the same
system. This greatly simplifies backups. (You do backups, don't you?) A shared
filesystem also simplifies setting up SSH, as it eliminates the need to distribute keys.
(SSH is described later in this chapter.) For this reason, you may want to set up NFS
before setting up SSH. NFS can also play an essential role in some installation
strategies.
If you have never used NFS before, setting up the client and the server are slightly
different, but neither is particularly difficult. Most Linux distributions come with most
of the work already done for you.
9.2.2.1 Running NFS
Begin with the server; you won't get anywhere with the client if the server isn't
already running. Two things need to be done to get the server running. The file
/etc/exports must be edited to specify which machines can mount which directories,
and then the server software must be started. Here is a single line from the file
/etc/exports on the server amy:
/home basil(rw) clara(rw) desmond(rw) ernest(rw) george(rw)
This line gives the clients basil, clara, desmond, ernest, and george read/write access
to the directory /home on the server. Read access is the default. A number of other
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 56 -
options are available and could be included. For example, the no_root_squash option
could be added if you want to edit root permission files from the nodes.
Had a space been inadvertently included between basil and (rw), read access would
have been granted to basil and read/write access would have been granted to all other
systems. (Once you have the systems set up, it is a good idea to use the command
showmount -a to see who is mounting what.)
Once /etc/exports has been edited, you'll need to start NFS. For testing, you can use
the service command as shown here
[root@fanny init.d]# /sbin/service nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS mountd: [ OK ]
Starting NFS daemon: [ OK ]
[root@fanny init.d]# /sbin/service nfs status
rpc.mountd (pid 1652) is running...
nfsd (pid 1666 1665 1664 1663 1662 1661 1660 1657) is running...
rpc.rquotad (pid 1647) is running...
(With some Linux distributions, when restarting NFS, you may find it necessary to
explicitly stop and restart both nfslock and portmap as well.) You'll want to change
the system configuration so that this starts automatically when the system is rebooted.
For example, with Red Hat, you could use the serviceconf or chkconfig commands.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 57 -
For the client, the software is probably already running on your system. You just need
to tell the client to mount the remote filesystem. You can do this several ways, but in
the long run, the easiest approach is to edit the file /etc/fstab, adding an entry for the
server. Basically, you'll add a line to the file that looks something like this:
amy:/home /home nfs rw,soft 0 0
In this example, the local system mounts the /home filesystem located on amy as the
/home directory on the local machine. The filesystems may have different names. You
can now manually mount the filesystem with the mount command
[root@ida /]# mount /home
When the system reboots, this will be done automatically.
When using NFS, you should keep a couple of things in mind. The mount point,
/home, must exist on the client prior to mounting. While the remote directory is
mounted, any files that were stored on the local system in the /home directory will be
inaccessible. They are still there; you just can't get to them while the remote directory
is mounted. Next, if you are running a firewall, it will probably block NFS traffic. If
you are having problems with NFS, this is one of the first things you should check.
File ownership can also create some surprises. User and group IDs should be
consistent among systems using NFS, i.e., each user will have identical IDs on all
systems. Finally, be aware that root privileges don't extend across NFS shared systems
(if you have configured your systems correctly). So if, as root, you change the
directory (cd) to a remotely mounted filesystem, don't expect to be able to look at
every file. (Of course, as root you can always use su to become the owner and do all
the snooping you want.) Details for the syntax and options can be found in the nfs(5),
exports(5), fstab(5), and mount(8) manpages.
9.2.3 SSH
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 58 -
To run software across a cluster, you'll need some mechanism to start processes on
each machine. In practice, a prerequisite is the ability to log onto each machine within
the cluster. If you need to enter a password for each machine each time you run a
program, you won't get very much done. What is needed is a mechanism that allows
logins without passwords.
This boils down to two choices—you can use remote shell (RSH) or secure shell
(SSH). If you are a trusting soul, you may want to use RSH. It is simpler to set up with
less overhead. On the other hand, SSH network traffic is encrypted, so it is safe from
snooping. Since SSH provides greater security, it is generally the preferred approach.
SSH provides mechanisms to log onto remote machines, run programs on remote
machines, and copy files among machines. SSH is a replacement for ftp, telnet, rlogin,
rsh, and rcp. A commercial version of SSH is available from SSH Communications
Security (http://www.ssh.com), a company founded by Tatu Ylönen, an original
developer of SSH. Or you can go with OpenSSH, an open source version from
http://www.openssh.org.
OpenSSH is the easiest since it is already included with most Linux distributions. It
has other advantages as well. By default, OpenSSH automatically forwards the
DISPLAY variable. This greatly simplifies using the X Window System across the
cluster. If you are running an SSH connection under X on your local machine and
execute an X program on the remote machine, the X window will automatically open
on the local machine. This can be disabled on the server side, so if it isn't working,
that is the first place to look.
There are two sets of SSH protocols, SSH-1 and SSH-2. Unfortunately, SSH-1 has a
serious security vulnerability. SSH-2 is now the protocol of choice. This discussion
will focus on using OpenSSH with SSH-2.
Before setting up SSH, check to see if it is already installed and running on your
system. With Red Hat, you can check to see what packages are installed using the
package manager.
[root@fanny root]# rpm -q -a | grep ssh
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 59 -
openssh-3.5p1-6
openssh-server-3.5p1-6
openssh-clients-3.5p1-6
openssh-askpass-gnome-3.5p1-6
openssh-askpass-3.5p1-6
This particular system has the SSH core package, both server and client software as
well as additional utilities. The SSH daemon is usually started as a service. As you
can see, it is already running on this machine.
[root@fanny root]# /sbin/service sshd status
sshd (pid 28190 1658) is running...
Of course, it is possible that it wasn't started as a service but is still installed and
running. You can use ps to double check.
[root@fanny root]# ps -aux | grep ssh
root 29133 0.0 0.2 3520 328 ? S Dec09 0:02 /usr/sbin/sshd
...
Again, this shows the server is running.
With some older Red Hat installations, e.g., the 7.3 workstation, only the client
software is installed by default. You'll need to manually install the server software. If
using Red Hat 7.3, go to the second install disk and copy over the file
RedHat/RPMS/openssh-server-3.1p1-3.i386.rpm. (Better yet, download the latest
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 60 -
version of this software.) Install it with the package manager and then start the
service.
[root@james root]# rpm -vih openssh-server-3.1p1-3.i386.rpm
Preparing... ########################################### [100%]
1:openssh-server ########################################### [100%]
[root@james root]# /sbin/service sshd start
Generating SSH1 RSA host key: [ OK ]
Generating SSH2 RSA host key: [ OK ]
Generating SSH2 DSA host key: [ OK ]
Starting sshd: [ OK ]
When SSH is started for the first time, encryption keys for the system are generated.
Be sure to set this up so that it is done automatically when the system reboots.
Configuration files for both the server, sshd_config, and client, ssh_config, can be
found in /etc/ssh, but the default settings are usually quite reasonable. You shouldn't
need to change these files.
9.2.3.1 Using SSH
To log onto a remote machine, use the command ssh with the name or IP address of
the remote machine as an argument. The first time you connect to a remote machine,
you will receive a message with the remote machines' fingerprint, a string that
identifies the machine. You'll be asked whether to proceed or not. This is normal.
[root@fanny root]# ssh amy
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 61 -
The authenticity of host 'amy (10.0.32.139)' can't be established.
RSA key fingerprint is 98:42:51:3e:90:43:1c:32:e6:c4:cc:8f:4a:ee:cd:86.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'amy,10.0.32.139' (RSA) to the list of known hosts.
root@amy's password:
Last login: Tue Dec 9 11:24:09 2003
[root@amy root]#
The fingerprint will be recorded in a list of known hosts on the local machine. SSH
will compare fingerprints on subsequent logins to ensure that nothing has changed.
You won't see anything else about the fingerprint unless it changes. Then SSH will
warn you and query whether you should continue. If the remote system has changed,
e.g., if it has been rebuilt or if SSH has been reinstalled, it's OK to proceed. But if you
think the remote system hasn't changed, you should investigate further before logging
in.
Notice in the last example that SSH automatically uses the same identity when
logging into a remote machine. If you want to log on as a different user, use the -l
option with the appropriate account name.
You can also use SSH to execute commands on remote systems. Here is an example
of using date remotely.
[root@fanny root]# ssh -l sloanjd hector date
sloanjd@hector's password:
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 62 -
Mon Dec 22 09:28:46 EST 2003
Notice that a different account, sloanjd, was used in this example.
To copy files, you use the scp command. For example,
[root@fanny root]# scp /etc/motd george:/root/
root@george's password:
motd 100% |************************ *****| 0 00:00
Here file /etc/motd was copied from fanny to the /root directory on george.
In the examples thus far, the system has asked for a password each time a command
was run. If you want to avoid this, you'll need to do some extra work. You'll need to
generate a pair of authorization keys that will be used to control access and then store
these in the directory ~/.ssh. The ssh-keygen command is used to generate keys.
[sloanjd@fanny sloanjd]$ ssh-keygen -b1024 -trsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/sloanjd/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/sloanjd/.ssh/id_rsa.
Your public key has been saved in /home/sloanjd/.ssh/id_rsa.pub.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 63 -
The key fingerprint is:
2d:c8:d1:e1:bc:90:b2:f6:6d:2e:a5:7f:db:26:60:3f sloanjd@fanny
[sloanjd@fanny sloanjd]$ cd .ssh
[sloanjd@fanny .ssh]$ ls -a
. .. id_rsa id_rsa.pub known_hosts
The options in this example are used to specify a 1,024-bit key and the RSA
algorithm. (You can use DSA instead of RSA if you prefer.) Notice that SSH will
prompt you for a pass phrase, basically a multi-word password.
Two keys are generated, a public and a private key. The private key should never be
shared and resides only on the client machine. The public key is distributed to remote
machines. Copy the public key to each system you'll want to log onto, renaming it
authorized_keys2.
[sloanjd@fanny .ssh]$ cp id_rsa.pub authorized_keys2
[sloanjd@fanny .ssh]$ chmod go-rwx authorized_keys2
[sloanjd@fanny .ssh]$ chmod 755 ~/.ssh
If you are using NFS, as shown here, all you need to do is copy and rename the file in
the current directory. Since that directory is mounted on each system in the cluster, it
is automatically available.
If you used the NFS setup described earlier, root's home
directory/root, is not shared. If you want to log in as root
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 64 -
without a password, manually copy the public keys to the target
machines. You'll need to decide whether you feel secure setting
up the root account like this.
You will use two utilities supplied with SSH to manage the login process. The first is
an SSH agent program that caches private keys, ssh-agent. This program stores the
keys locally and uses them to respond to authentication queries from SSH clients. The
second utility, ssh-add, is used to manage the local key cache. Among other things, it
can be used to add, list, or remove keys.
[sloanjd@fanny .ssh]$ ssh-agent $SHELL
[sloanjd@fanny .ssh]$ ssh-add
Enter passphrase for /home/sloanjd/.ssh/id_rsa:
Identity added: /home/sloanjd/.ssh/id_rsa (/home/sloanjd/.ssh/id_rsa)
(While this example uses the $SHELL variable, you can substitute the actual name of
the shell you want to run if you wish.) Once this is done, you can log in to remote
machines without a password.
This process can be automated to varying degrees. For example, you can add the call
to ssh-agent as the last line of your login script so that it will be run before you make
any changes to your shell's environment. Once you have done this, you'll need to run
ssh-add only when you log in. But you should be aware that Red Hat console logins
don't like this change.
You can find more information by looking at the ssh(1), ssh-agent(1), and ssh-add(1)
manpages. If you want more details on how to set up ssh-agent, you might look at
SSH, The Secure Shell by Barrett and Silverman, O'Reilly, 2001. You can also find
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 65 -
scripts on the Internet that will set up a persistent agent so that you won't need to
rerun ssh-add each time.
9.2.4 Hosts file and name services
Life will be much simpler in the long run if you provide appropriate name services.
NIS is certainly one possibility. At a minimum, don't forget to edit /etc/hosts for your
cluster. At the very least, this will reduce network traffic and speed up some software.
And some packages assume it is correctly installed. Here are a few lines from the host
file for amy:
127.0.0.1 localhost.localdomain localhost
10.0.32.139 amy.wofford.int amy
10.0.32.140 basil.wofford.int basil
...
Notice that amy is not included on the line with localhost. Specifying the host name as
an alias for localhost can break some software.
9.3 Working with Parallex
Once the master has been configured and all nodes are up, working with Parallex to
utilize all your available resources is very easy. Follow these simple steps to use the
power of all nodes that are up.
• Compile your code and place it in $PARALLEX_DIR/bin/
You can use the Makefile to do this for you.
# make main_app
• After the application is compiled without any errors, first start the networking
monitoring tool of Parallex
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 66 -
# netmon
• Parallex will now know which machines are up and running in your cluster.
To read information about the machines, run the following command:
# parastat
• To get a graphical representation about CPU usage and other stats about your
slave machines run the Gkrellm configuration script.
# gkrllm_config
• To run the main application on Parallex engine just run the master script
followed by the full path of the executable binary that was compiled from your
source application and a list of arguments that indicate the data set that is to be
parallelized as follows:
# parallex ../bin/my_app 1 99999999
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 67 -
Chapter 10. Conclusion
There exist many solutions for running applications on distributed/parallel
systems. Parallex however, is a single complete solution that takes care of all
issues related to High Performance Computing right from cluster boot up, to
management of processes on remote machines.
Parallex is also unique in the sense that it supports both dedicated and shared
cluster architecture. The ability of Parallex to efficiently utilize the available
computing resources means that the cluster does not require any special kind of
hardware, nor does it have to be homogenous i.e. of the same kind, thus resulting
in significant cost savings.
Parallex in its current state, is intended for use in educational institutes and small
to medium sized businesses. However, it can be easily adapted for a range of
applications from mathematical, scientific, to 3D rendering.
Hence because of its simplicity, adaptability, ease of use and relatively low cost of
ownership we can conclude that Parallex is a poor man’s Super Computer
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 68 -
Chapter 11. Future Enhancements
Handling Binary-level parallelism: Given the source code, the master can
successfully break the application for processing in parallel, however to handle binary
executables we use the openMosix technology which is a Linux kernel extension for
single-system image clustering. Processes originating from any one node, if that node
is too busy compared to others, can migrate to any other node. OpenMosix
continuously attempts to optimize the resource allocation. The distributed computing
concept is implemented by openMosix by extending the kernel and thus it is
transparent to all applications. We are trying to include openMosix so that we can add
load balancing into parallel processing.
Compatibility with non-Unix platforms: At present Parallex can run on multiple
platforms with the only restriction that all should be Unix based (Linux, FreeBSD,
NetBSD, Plan 9, Darwin etc.). Another restriction is that the applications needed to be
run on Parallex should be compliant with all the above systems. To be able to work
with other platforms, one solution is to have a virtual machine running one of the
above supported platforms as guest OS.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 69 -
Chapter 12. References
[1] Parallel Computer Architectures: Hardware/Software Approach. Culler, David.
Morgran Coffman Publishers. San Fransisco,CA.
[2] High Performance Computing: 2nd Edition, Dowd Kavin and Charles.
Sebastopol , CA : ORielly and Associates
[3] Source Book of Parallel Computing: Dongara, Jack. Morgran Coffman
Publishers. San Fransisco,CA
[4] High Performance Linux Clusters: Joseph Sloan, CA:O’ReillyMedia Inc.
[5] Parallel Computing on Heterogeneous Networks by Alexey L. Lastovetsky
[6] Designing and Building Parallel Programs: Ian Foster
[7] Tools and environments for Parallel and Distributed Computing: Salim Hariri,
Manish Parashar
[8] Performance tuning techniques for Clusters: Troy Baer
[9] Introduction to Parallel Computing: Los Alamos National Laboratory
[10] http://bproc.sourceforge.net/bproc.html: BProc Homepage
[11] www.beowulf.org: Homepage of the Beowulf project
[12] Beowulf Cluster Computing with Linux, Second Edition: William Gropp, Ewing
Lusk and Thomas Sterling
[13] Parallel I/O for High Performance Computing: John M. May
[14] High Performance Computing and Beowulf Clusters: R.J. Allan, S.J. Andrews
and M.F. Guest
[15] www.kernel.org: Kernel Sources
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 70 -
APPENDIX- A. BProc
BProc (Beowulf Distributed Process Space )
The Beowulf Distributed Process Space (BProc) is set of kernel modifications,
utilities and libraries which allow a user to start processes on other machines in a
Beowulf-style cluster. Remote processes started with this mechanism appear in the
process table of the front end machine in a cluster. This allows remote process
management using the normal UNIX process control facilities. Signals are
transparently forwarded to remote processes and exit status is received using the usual
wait() mechanisms.
BProc:-
• Manages a single process-space across machine
• Responsible for process startup and management
• Provides commands for starting processes, copying files to nodes, etc.
BProc is a Linux kernel modification which provides:-
• A single system image for process control in a cluster
• Process migration for creating processes in a cluster
In a BProc cluster, there is a single master and many slaves
• Users (including root) only log into the master
• The master’s process space is the process space for the cluster
• All processes in the cluster are
• Created from the master
• Visible on the master
• Controlled from the master
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 71 -
A1.0 Motivation
rsh and rlogin are a lousy way to interact with the machines in a cluster.
Being able to log into any machine in the cluster instantly necessitates a large amount
of software and configuration be present on the machine. You will need things like
shells for people to log in. You will need an up to date password database. You'll need
all the little programs that people expect to see on a UNIX system for people to be
comfortable using the system. You'll probably also need all the setup scripts and
associated configuration information to get the machines up to the point where they're
actually usable by the users. That sucks. There's an awful lot of configuration there.
With a large number of machines, it's also very easy for the users to make a mess.
Runaway processes are a problem.
The goal of BProc is to change to model of the cluster from a pile of PCs to
single machine with a collection of network attached compute resources. And, of
course, to do away with rsh and rlogin in the cluster environment.
Once we do away with the interactive logins, we get two basic needs. We need
a way to start processes on remote machines and most importantly, we need a way to
monitor and control what's going on the remote machines.
BProc provides process migration mechanisms which allow a process to place
copies of itself on remote machines via a remote fork system call. When creating
remote processes via this mechanism, the child processes are all visible in the front
end's process tree.
The central idea in BProc is the idea of a distributed process ID (PID) space.
Every instance of Linux has a process space - a pool of process IDs and a process tree.
BProc takes the process space of the front end machine and allows portions of it to
exist on the other machines in the cluster. The machine distributing pieces of its
process space is the master machine and the machines accepting pieces of it to run are
the slave machines.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 72 -
A2.0 Process Migration
• BProc provides a process migration system to place processes on
other nodes in the cluster
• Process migration on BProc is not
• Transparent
• Preemptive
• A process must call the migration system call in order to move
• Process migration on BProc is
• Very fast (1.9s to place a 16MB process on 1024 nodes)
• Scalable
• It can create many copies for the same process (e.g. MPI startup) very efficiently
• O(log #copies)
A2.1 Process migration does preserve
• The contents of memory and memory related metadata
• CPU State (registers)
• Signal handler state
• Process migration does not preserve
• Shared memory regions
• Open files
• SysV IPC resources
• Just about anything else that isn’t “memory”
A3.0 Running on a Slave Node
• BProc is a process management system
• All other system calls are handled locally on the slave node
• BProc does not impose any extra overhead on non-process related
system calls
• File and Network I/O are always handled locally
• Calling open() will not cause contact with the master node
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 73 -
• This means network and file I/O are as fast as they can be
A4.0 Implementation:
BProc consists of four basic pieces. On the master node, there are "ghost processes"
which are place holders in the process tree that represent remote processes. There is
also the master daemon which is the message router for the system and is also the
piece which maintains state information about which processes exist where. On the
slave nodes there is process ID masquerading which is a system of lying to processes
there so that they appear (to themselves) to be in the master's process space. There is
also a simple daemon on the slave side which is mostly just a message pipe between
the slave's kernel and the network.
A4.1 Ghost Processes
Code reuse is good. BProc tries to recycle of as much of the kernel's existing process
infrastructure as possible. The UNIX process model is well thought out and certainly
well understood. All the details of the UNIX model have been hammered out and it
works well. Rather than try and change or simplify it for BProc, BProc tries to keep it
entirely. Rather than creating some new kind of semi-bogus process tree, BProc uses
the existing tree and fills the places which represent remote processes with light
weight "ghost" processes.
Ghost processes are normal processes except that they lack a memory space and open
files. They resemble kernel threads like kswapd and kflushd. It is possible for ghosts
to wake up and run on the front end. They have their own status (i.e. sleeping,
running) which is independent of the remote processes they represent. Most of the
time, however, they sleep and wait for the remote process to request one of the few
operations which are performed on their behalf.
Ghost processes mirror portions of the status of the remote process. The status include
information such as the process state and the amount of CPU time that it has used so
far. This aternate status is what gets presented to user space in the procfs filesystem.
This status gets updated on demand (via a request to the real process) and no more
often than every 5 seconds.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 74 -
Ghosts catch and forward signals to the remote process. Since ghosts are kernel
threads (not running in user space), they can catch and forward SIGKILL and
SIGSTOP. There is no way to get rid of ghost process without the remote process
exiting.
Ghosts perform certain operations on behalf of the real processes they represent. In
particular they do fork() and wait(). If a process on a remote machine decides to fork,
a new process ID must be allocated for it in the master's process space. Also, we
should see a new ghost on the front end when the remote process forks. Having the
ghost call fork() accomplishes both of these nicely. Likewise, the ghost process will
also clean up the process tree on the front end by performing wait()s when necessary.
Finally, the ghost will exit() with the appropriate status when the remote process it
represents exits. Since the ghost is a kernel thread, it can accurately reflect the exit
status of the remote process including states such as killed by a signal and core
dumped.
A4.2 Process ID Masquerading
The slave nodes accept pieces of the master's process space. The problem here is
although a process might move to a different machine, it should not appear (to that
process) that it's left the process space of the front end. That means things like the
process ID can't change and system calls like kill() should function as if the process
was still on the front end. That is we shouldn't be able to send signals across process
spaces to the other processes on the slave node.
Since the slave doesn't control the process space of the processes it's accepting, not all
operations can be handled entirely locally either. fork() is a good example.
The solution that BProc uses is to ignore the process ID that a process gets when it's
created on the slave side. BProc attaches a second process ID to the process and
modifies the process ID related system calls to essentially lie to the process about
what its ID is.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 75 -
Having this extra tag also allows the slave daemon to differentiate the process from
the other processes on the system when performing process ID related system calls.
A4.3 The Daemons
The master and slave daemons are the glue connecting the ghosts and the real
processes together.
A4.4 Design Principles
BProc's design is based on the following basic principles.
A4.5 Code reuse is good
BProc uses place holders called ghosts in the normal UNIX process tree on the front
end to represent remote processes. The parent child relationships are a no-brainer that
way and so is handling signals, wait, etc.
A4.6 Code reuse is really good
Code reuse is even more important in user space since things seem to change so
regularly. To avoid having to write our own set of process viewing utilities like ps and
top. BProc presents all the information about remote processes in the procfs file
system just like the system does for normal processes. As long as we keep up with
changes in the procfs file system, all existing and future process viewing/control
utilities will continue work for all time.
This is especially important in user space since user space programs seem to change
very often.
A4.7 The System must be bullet proof! (from user space)
Processes can't escape or confuse the management system. Ghosts need to properly
forward all signals including SIGKILL and SIGSTOP. There is no way for a ghost to
exit without the process it represents also exiting.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 76 -
A4.8 Kernels shouldn't talk on the network.
The kernel is a very very bad place to screw up. Try and keep as much as possible
outside of kernel space. This includes message routing and all the information about
the current state of the machine.
A4.9 Minimum knowledge
If a piece of the system doesn't really need to know something don't let it know. The
master daemon is the only piece that knows where the processes actually exist. The
kernel layers only have a notion of processes that are here or not here. Slaves don't
know what node number they are.
In Brief :
• All processes are started from the master with process migration
• All processes remain visible on the master
• No runaways
• Normal UNIX process control works for ALL processes in the
Cluster
• No need for direct interaction
• There is no need to log into a node to control what is running there
• No software is required on the nodes except the BProc slave
Daemon
• ZERO software maintenance on the nodes!
• Diskless nodes without NFS root
• Reliable nodes
A4.10 Screen Shots
Every self respecting piece of software provides a screen shot of some kind. For
BProc we have a shot of top. Note the CPU states line. cpumunch is a stupid little
program that just eats up CPU time on remote nodes.
3:08pm up 2:25, 7 users, load average: 0.13, 0.07, 0.07
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 77 -
175 processes: 46 sleeping, 129 running, 0 zombie, 0 stopped
CPU states: 12798.7% user, 8.3% system, 0.0% nice, 0.0% idle
Mem: 128188K av, 57476K used, 70712K free, 23852K shrd, 17168K buff
Swap: 130748K av, 0K used, 130748K free
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME
COMMAND
1540 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:35 cpumunch
1541 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:35 cpumunch
1542 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:35 cpumunch
1543 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:35 cpumunch
1544 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:35 cpumunch
1545 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:35 cpumunch
1546 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:34 cpumunch
1547 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:34 cpumunch
1548 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:34 cpumunch
1549 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:34 cpumunch
1550 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:34 cpumunch
1551 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:34 cpumunch
1552 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:34 cpumunch
1553 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:34 cpumunch
The processes here appear swapped because the ghosts don't have a memory space
and procfs doesn't mirror remote memory sizes.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 78 -
APPENDIX- B. POV - Ray
B1.0 What is POV-Ray?
POV-RayTM is short for the Persistence of VisionTM Raytracer, a tool for producing
high-quality computer graphics. POV-RayTM is copyrighted freeware, that is to say,
we, the authors, retain all rights and copyright over the program, but that we permit
you to use it for no charge, subject to the conditions stated in our license, which
should be in the documentation directory as povlegal.doc.
Without a doubt, POV-Ray is the worlds most popular raytracer. From our website
alone we see well over 100,000 downloads per year, and this of course doesn't count
the copies that are obtained from our mirror sites, other internet sites, on CD-ROM, or
from persons sharing their copies.
The fact that it is free helps a lot in this area, of course, but there's more to it than that.
There are quite a few other free ray tracers and renderers available. What makes this
program different?
The answers are too numerous to detail in full here. Suffice it to say that POV-Ray
has the right balance of power and versatility to satisfy extremely experienced and
competent users, while at the same time not being so intimidating as to completely
scare new users off.
Of course, the most important factor is image quality, and in the right hands, POV-
Ray has it. We, the developers, have seen images that were rendered using our
software that we at first thought were photographs - they were that realistic. (Note that
photo-realism is an advanced skill; one that takes some practice to develop).
B1.1 What is POV-Ray for Unix?
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 79 -
POV-Ray for Unix is essentially a version of the POV-Ray rendering engine prepared
for running on a Unix or Unix-like operating system (such as GNU/Linux). It contains
all the features of POV-Ray described in chapters 2 and 3 of the documentation, plus
a few others specific to UNIX and GNU/Linux systems. These additional features do
not affect the core rendering code. They only make the program suitable for running
under an Unix-based system, and provide the user with Unix-specific displaying
capabilities. For instance, POV-Ray for UNIX can use the X Window System to
display the image it is rendering. On GNU/Linux machines, it can also display the
image directly on the console screen using the SVGA library.
POV-Ray for Unix uses the same scheme as the other supported platforms to create
ray-traced images. The POV-Ray input is platform-independent, as it is using text
files (POV-Ray scripts) to describe the scene: camera, lights, and various objects.
B2.0 Available distributions
There are two official distributions of POV-Ray for Unix available:
• Source package: this package contains all the source files and Makefiles
required for building POV-Ray. Building the program from source should
work on most Unix systems. The package uses a configuration mechanism to
detect the adequate settings in order to build POV-Ray on your own platform.
All required support libraries are included in the package. See the INSTALL
file of the source package for details.
• Linux binary package: this package contains a compiled version of POV-Ray
for x86-compatible platforms running the GNU/Linux operating system. A
shell script for easy installation is also included. Further details are given in
the README file of this package.
Both distributions are available for download at the POV-Ray website and on the
POV-Ray FTP server (ftp.povray.org).
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 80 -
B3.0 Configuration
All official versions of POV-Ray for Unix come with procedures for correctly
installing and configuring POV-Ray. These explanations are for reference.
B3.1.1 The I/O Restrictions configuration file
When POV-Ray starts it reads the configuration for the I/O Restriction feature from
the povray.conf files. See the I/O Restrictions Documentation for a description of
these files.
B3.1.2 The main POV-Ray INI file
When starting, POV-Ray for UNIX searches for an INI file containing default
configuration options. The details can be found in the INI File Documentation.
B3.1.3 Starting a Render Job
Starting POV-Ray rendering any scene file is as simple as running povray from a
command-line with the scene file name as an argument. This will work with either a
POV file or an INI file (as long as it has an associated POV file). See Understanding
File Types. The scene is rendered with the current POV-Ray 3 options (see
Understanding POV-Ray Options).
Note: One of the more common errors new users make is turning off the display
option. The Display option (+d) is ON by default. If you turn this OFF in the INI
file or on the command line, POV-Ray will not display the file as you render.
Please also note that POV-Ray for Unix will write the output file to a .png by default.
There is no way to 'save the render window' after rendering is completed. If you
turned file output off before starting the render, and change your mind, you will have
to start the rendering all over again. We recommend that you just leave file output on
all the time.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 81 -
B3.2.1 X Window display
When the X Window display is used, the rendered image is displayed in a graphics
window. During rendering, the window will be updated after every scanline has been
rendered, or sooner if the rendering is taking a long time. To update it sooner you can
click any mouse button in the window or press (almost) any key. Pressing <CTRL-R>
or <CTRL-L> during rendering will refresh the whole screen. If you have the
Exit_Enable or +X flag set, pressing 'q' or 'Q' at any time during the rendering will
stop POV-Ray rendering and exit. The rendering will pause when complete if the
Pause_When_Done (or +P) flag is set. To exit at this point, press the 'q' or 'Q' key or
click any mouse button in the window.
POV-Ray 3.6 includes a color icon in the program if it was compiled with libXpm
(which is available on most platforms where the X Window System is installed). If
this icon is used for the render view window depends on the window manager being
used (KDE, Gnome, fvwm, ...). POV-Ray also comes with a separate color icon
(xpovicon.xpm) for use with the window managers that can use external icons. For
instance, to have fvwm use this icon, copy the icon file to one of the directories
pointed to by PixmapPath (or ImagePath) which is defined in your $HOME/.fvwmrc.
Then, add the following line in $HOME/.fvwmrc:
Style "Povray" Icon xpovicon.xpm
and re-start the X Window server (re-starting fvwm will not be enough). Using this
icon with another window manager may use a different procedure.
Documentation of the special command line options to configure the X Window
display can be found in Special Command-Line Options.
B3.2.2 SVGAlib display
For GNU/Linux systems that don't have the X Window System installed, or for those
Linux users who prefer to run on the console, it is possible to use the SVGA library to
display directly to the screen. For SVGAlib display, the povray binary must be
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 82 -
installed as a setuid root executable. If POV-Ray does not use SVGAlib display, first
try (as root):
chown root.root povray
chmod 4755 povray
Note: Doing this may have serious security implications. Running POV-Ray as
root or through 'sudo' might be a better idea.
If it still doesn't work then make sure SVGAlib is installed on your machine and
works properly. Anything that can at least use the 320x200x256 mode (ie regular
VGA) should be fine, although modes up to 1280x1024x16M are possible. If you do
not have root privileged or can't have the system admin install POV-Ray, then you
must use the X Window or text display which do not require any special system
priviledges to run. If you are using a display resolution that is lower than what you are
rendering, the display will be scaled to fit as much of the viewing window as possible.
B3.3.0 Output file formats
The default output file format of POV-Ray for Unix is PNG (+fn). This can be
changed at runtime by setting the Output_File_Type or +fx option. Eventually, the
default format can be changed at compile time by setting
DEFAULT_FILE_FORMAT in the config.h file located in the unix/ directory.
Other convenient formats on Unix systems might be PPM (+fp) and TGA (+ft). For
more information about output file formats see File Output Options.
If you are generating histogram files (See CPU Utilization Histogram) in the CSV
format (comma separated values), then the units of time are in tens of microseconds
(10 x 10-6 s), and each grid block can store times up to 12 hours.
To interrupt a rendering in progress, you can use CTRL-C (SIGINT), which will
allow POV-Ray to finish writing out any rendered data before it quits. When graphics
display mode is used, you can also press the 'q' or 'Q' keys in the rendering preview
window to interrupt the trace if the Test_Abort (or +X) flag is set.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 83 -
B4.0 Rendering the Sample Scenes
POV-Ray for Unix comes with a set of shell scripts to automatically render the
sample scenes coming with POV-Ray.
These shell scripts are usually installed in /usr/local/share/povray-3.6/scripts. They
require a bash compatible shell. There are three scripts that are supposed to be called
by the user.
• allscene.sh:
renders all stills. The syntax is:
allscene.sh [log] [all] [-d scene_directory] [-o output_directory] [-h html_file]
If html_file is specified a HTML listing of the rendered scenes is generated. if
ImageMagick is installed the listing will also contain thumbnails of the
rendered images.
• allanim.sh:
renders all animations. The syntax is:
allanim.sh [log] [-d scene_directory] [-o output_directory] [-h html_file]
If ffmpeg is installed the script will compile mpeg files from the rendered
animations.
• portfolio.sh:
renders the portfolio. The syntax is:
portfolio.sh [log] [-d scene_directory] [-o output_directory]
The portfolio is a collection of images illustrating the POV-Ray features and
include files coming with the package.
If the option log is specified, a log file with the complete text output from POV-Ray is
written (filename log.txt)
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 84 -
If scene_directory is specified, the sample scenes in this directory are rendered,
otherwise the scene directory is determined form the main povray ini file (usually
/usr/local/share/povray-3.6/scenes).
If output_directory is specified, all images are written to this directory; if it is not
specified the images are written into the scene file directories. If the directories are
not writable, the images are written in the current directory. All other files (html files,
thumbnails) are written here as well.
To determine the correct render options the scripts analyze the beginning of the scene
files. They search for a comment of the form
// -w320 -h240 +a0.3
in the first 50 lines of the scene. The animation script possibly also uses an INI file
with the same base name as the scene file. The allscene.sh has the additional all
option which - if specified - renders also scenes without such an options comment
(using default options then).
B5.0 POV-Ray for Unix Tips
B5.1 Automated execution
POV-Ray for Unix is well suited for automated execution, for example, for rendering
diagrams displaying statistical data on a regular basis or similar things.
POV-Ray can also write its image output directly to stdout. Therefore the image data
can be piped in another program for further processing. To do this the special output
filename '-' needs to be specified. For instance:
povray -iscene.pov +fp -o- | cjpeg > scene.jpg
will pass the image data to the cjpeg utility which writes the image in the JPEG
format.
The text output of POV-Ray is always written to stderr, it can be redirected to a file
with (using a Bourne-compatible shell):
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 85 -
povray [Options] 2> log.txt
For remote execution of POV-Ray, as for example in a rendering service on the web,
make sure you read and comply with the POV-Ray Legal Document.
B6.0 Understanding File Types
B6.1 POV Files
POV-Ray for Unix works with two types of plain text files. The first is the standard
POV-Ray scene description file. Although you may give files of this type any
legitimate file name, it is easiest if you give them the .pov extension. In this Help file,
scene description files are referred to as POV files.
The second type, the initialization file, is new to POV-Ray 3. Initialization files
normally have .ini extensions and are referred to in this help file as INI files.
B6.2 INI Files
An INI file is a text file containing settings for what used to be called POV-Ray
command-line options. It replaces and expands on the functions of the DEF files
associated with previous versions of POV-Ray. You can store a default set of options
in the main POV-Ray INI file which is searched for at the following locations:
• The place defined by the POVINI environment variable. When you want to
use an INI file at a custom location you can set this environment variable.
• ./povray.ini
• $HOME/.povray/3.6/povray.ini
• PREFIX/etc/povray/3.6/povray.ini (PREFIX by default is /usr/local)
For backwards compatibility with version 3.5, POV-Ray 3.6 also attempts to read the
main INI file from the old locations when none is found at the places above:
• $HOME/.povrayrc
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 86 -
• PREFIX/etc/povray.ini (PREFIX by default is /usr/local)
Note: Use of these locations is deprecated; they will not be available in future
versions.
Any other INI file can be specified by passing the INI file name on the command line.
One of the options you can set in the INI file is the name of an input file. You can
specify the name of a POV file here. This way you can customize POV-Ray settings
for any individual scene file.
For instance, if you have a file called scene.pov, you can create a file scene.ini to
contain settings specific for scene.pov. If you include the option
'Input_File_Name=scene.pov' in scene.ini, and then run povray scene.ini, POV-Ray
will process scene.pov with the options specified in scene.ini.
Remember, though, that any options set at the command line when you activate an
INI file override any corresponding options in the INI file (see Understanding POV-
Ray Options). Also, any options you do not set in the INI file will be taken as last set
by any other INI file or as originally determined in povray.ini.
You can instruct POV-Ray to generate an INI file containing all the options active at
the time of rendering. This way, you can pass a POV file and its associated INI file on
to another person and be confident that they will be able to generate the scene exactly
the same way you did. See the section titled Using INI Files for more information
about INI files.
B6.2.1 INI File Sections
Sections are not files in themselves; they are portions of INI files. Sections are a
means of grouping multiple sets of POV-Ray options together in a single INI file, by
introducing them with a section label. Consider the following INI file, taken from the
POV-Ray 3 documentation:
; RES.INI
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 87 -
; This sample INI file is used to set resolution.
+W120 +H100 ; This section has no label.
; Select it with "RES"
[Low]
+W80 +H60 ; This section has a label.
; Select it with "RES[Low]"
[Med]
+W320 +H200 ; This section has a label.
; Select it with "RES[Med]"
[High]
+W640 +H480 ; Labels are not case sensitive.
; "RES[high]" works
[Really High]
+W800 +H600 ; Labels may contain blanks
If you select this INI file, the default resolution setting will be 120 x 100. As soon as
you select the [High] section, however, the resolution becomes 640 x 480.
B7.0 Special Command-Line Options
POV-Ray for Unix supports several special command-line options not recognized by
other versions. They follow the standards for programs that run under the X Window
System.
-display <display_name>
Display preview on display_name rather than the default display. This is
meant to be used to change the display to a remote host. The normal dispay
option +d is still valid.
-geometry [WIDTHxHEIGHT][+XOFF+YOFF]
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 88 -
Render the image with WIDTH and HEIGHT as the dimensions, and locate
the window XOFF from the left edge, and YOFF from the top edge of the
screen (or if negative the right and bottom edges respectively). For instance: -
geometry 640x480+10+20 creates a display for a 640x480 image placed at
(10, 20) pixels from the top-left corner of the screen. The WIDTH and
HEIGHT, if given, override any previous +Wn and +Hn settings.
-help
Display the X Window System-specific options. Use -H by itself on the
command-line to output the general POV-Ray options.
-icon
Start the preview window as an icon.
-title <window_title>
Override the default preview window title with window_title.
-visual <visual_type>
Use the deepest visual of visual_type, if available, instead of the automatically
selected visual. Valid visuals are StaticGray, GrayScale, StaticColor,
PseudoColor, TrueColor, or DirectColor.
Note: if you are supplying a filename with spaces in it, you will need to enclose
the filename itself within quotes.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 89 -
Glossary of Terms and Acronyms
3D RENDERING
Creating 3D animations or 3D scenes.
BEOWULF CLUSTER
High Performance cluster built with commodity off the shelf
hardware.
BINARY-LEVEL PARALLELISM
Parallelism at instruction level.
BLAS
Basic Linear Algebra Subprograms (BLAS) is a de facto
application programming interface standard for publishing
libraries to perform basic linear algebra operations such as
vector and matrix multiplication.
BPROC
The Beowulf Distributed Process Space (BProc) is set of kernel
modifications, utilities and libraries which allow a user to start
processes on other machines in a Beowulf-style cluster.
CAT 5 CABLES
Category 5 cable, commonly known as Cat 5 or "Cable and
Telephone", is a twisted pair cable type designed for high
signal integrity. This type of cable is often used in structured
cabling for computer networks such as Ethernet, and is also
used to carry many other signals such as basic voice services,
token ring, and ATM.
DHCP
Dynamic host control protocol. It is used to assign IP leases to
client machine.
DISTRIBUTED COMPUTING
Distributed computing is a form of computing for a collection
of independent machines that appears to its users as a single
coherent system.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 90 -
FREEBSD
FreeBSD is a Unix-like free operating system descended from
AT&T UNIX via the Berkeley Software Distribution (BSD)
branch through the 386BSD and 4.4BSD operating systems.
FreeBSD has been characterized as "the unknown giant among
free operating systems."[2] It is not a clone of UNIX, but works
like UNIX, with UNIX-compliant internals and system APIs.
IRIX
IRIX is a operating system by Silicon Graphics Inc.
MIPS
MIPS (originally an acronym for Microprocessor without
Interlocked Pipeline Stages) is a RISC microprocessor
architecture developed by MIPS Technologies. MIPS designs
are currently primarily used in many embedded systems such as
the Series2 TiVo, Windows CE devices, Cisco routers, Foneras,
and video game consoles like the Nintendo 64 and Sony
PlayStation, PlayStation 2, and PlayStation Portable handheld
system.
MPI
Message Parsing Interface. MPI is a library specification for
message-passing, proposed as a standard by a broadly based
committee of vendors, implementers, and users.
NETBSD
NetBSD is a freely redistributable, open source version of the
Unix-derivative BSD computer operating system. Noted for its
portability and quality of design and implementation, it is often
used in embedded systems and as a starting point for the
porting of other operating systems to new computer
architectures
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 91 -
NFS
Network File System. A network filesystem is a filesystem that
physically resides on one computer (the file server), which in
turn shares its files over the network with other computers on
the network (the clients).
PLAN 9
Plan 9 from Bell Labs is a distributed operating system,
primarily used as a research vehicle. It was developed as the
research successor to Unix by the Computing Sciences
Research Center at Bell Labs. Plan 9 is most notable for
representing all system interfaces, including those required for
networking and the user-interface, through the filesystem rather
than specialized interfaces.
POVRAY
Persistence Of Vision Ray tracer. A 3D rendering Tool.
PVM
Parallel Virtual machine. A tool used to make run applications
in parallel
RARP
Reverse address resolution protocol. It is used to resolve IP
address from MAC address.
RPM
RPM Package Manager (originally Red Hat Package Manager,
abbreviated RPM) is a package management system.[1] The
name RPM refers to two things: a software package file format,
and software packaged in this format. RPM was intended
primarily for Linux distributions; the file format RPM is the
baseline package format of the Linux Standard Base.
RSH
Remote shell protocol. It is used for remote login into client
machines.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 92 -
SSCI
Single System Image (SSI) Clustering. Presenting the collection
of machines that make up a cluster as a single machine.
SSH
Secure Shell Protocol. It is an encrypted version of RSH. This
is used in connecting with remote machine in network or login
into the remote machine using its password.
TCP/IP
Transmission control Protocol/Internet Protocol. This Protocol
is used in transmission of messages safely between computers
or different networks.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 93 -
“ Parallex – The Super Computer” Memorable Journey
Parallex’s First Prototype with two machines
“Parallex – The Super Computer” with Diskless Machines
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 94 -
Display of Parallex Master
At Parallex Stall with our Project Guide Prof. Anil J. Kadam
(Representing Computer Department)
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 95 -
All smiles: Chief Guest and Guest of Honour of Engineeting Today
2008 at Parallex Stall
Explaining our “Parallex – Super computer”
(Our HOD madam at extreme right)
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 96 -
“ Parallex – The Super Computer” Achievements
� FIRST in Intercollegiate National Level Event
“EXCELSIOR 08” project competition & exhibition.
� FIRST in National Level Students Technical Symposium
and Exposition “AISSMS Engineering Today 2008” Project
Competition.
� SECOND in National Level Students Technical Symposium
and Exposition “AISSMS Engineering Today 2008” Technical
Paper Presentation.
� FIRST in National level Technical event “Zion 2008” project
competition.
� Finalist in many National level project competitions.
� Letter of Recommendation from our Head of Department and
support for setting up “High Performance Computing”
laboratory (Letter attached on next page).
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 97 -
Letter of Recommendation