Distributed Computing
-
Upload
sudarsun-santhiappan -
Category
Technology
-
view
11.402 -
download
0
Transcript of Distributed Computing
Introducing a New Product
Distributed Computing
Sudarsun Santhiappansudarsun@{burning-glass.com, gmail.com}Burning Glass TechnologiesKilpauk, Chennai 600010
Technology is Changing...
Computational Power gets Doubled every 18 months
Networking Bandwidth and Speed getting Doubled every 9 months
How to tap the benefits of this Technology ?
Should we grow as an Individual ?
Should we grow as a Team ?
The Coverage Today
Parallel Processing
Multiprocessor or Multi-Core Computing
Symmetric Multiprocessing
Cluster Computing {PVM}
Distributed Computing {TAO, OpenMP}
Grid Computing {Globus Toolkit}
Cloud Computing {Amazon EC2}
Parallel Computing
It is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently in parallel.
Multi-Core, Multiprocessor SMP, Massively Parallel Processing (MPP) Computers
Is it easy to write a parallel program ?
Cluster Computing
A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer
Operate in shared memory mode (mostly)
Tightly coupled with high-speed networking, mostly with optical fiber channels.
HA, Load Balancing, Compute Clusters
Can we Load Balance using DNS ?
Distributed Computing
Wikipedia: It deals with hardware and software systems containing more than one processing element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly controlled regime
Grid Computing
Wikipedia: A form of distributed computing whereby a super and virtual computer is composed of a cluster of networked, loosely-coupled computers, acting in concert to perform large tasks.
pcwebopedia.com: Unlike conventional networks that focus on communication among devices, grid computing harnesses unused processing cycles of all computers in a network for solving problems too intensive for any stand-alone machine.
IBM: Grid computing enables the virtualization of distributed computing and data resources such as processing, network bandwidth and storage capacity to create a single system image, granting users and applications seamless access to vast IT capabilities. Just as an Internet user views a unified instance of content via the Web, a grid user essentially sees a single, large virtual computer.
Sun: Grid Computing is a computing infrastructure that provides dependable, consistent, pervasive and inexpensive access to computational capabilities.
Cloud Computing
Wikipedia: It is a style of computing in which dynamically stable and often virtualised resources are provided as a service over the Internet.
Infrastructure As A Service (IaaS)
Platform As A Service (PaaS)
Software as a Service (SaaS)
Provide common business applications online accessible from a web browser.
Amazon Elastic Computing, Google Apps
Hardware: IBM p690 Regatta
32 POWER4 CPUs (1.1 GHz)32 GB RAM218 GB internal diskOS: AIX 5.1Peak speed: 140.8 GFLOP/s*Programming model: shared memory multithreading (OpenMP) (also supports MPI)*GFLOP/s: billion floating point operations per second
270 Pentium4 XeonDP CPUs270 GB RAM8,700 GB diskOS: Red Hat Linux Enterprise 3Peak speed: 1.08 TFLOP/s*Programming model: distributed multiprocessing (MPI)
*TFLOP/s: trillion floating point operations per second
Hardware: Pentium4 Xeon Cluster
56 Itanium2 1.0 GHz CPUs112 GB RAM5,774 GB diskOS: Red Hat Linux Enterprise 3Peak speed: 224 GFLOP/s*Programming model: distributed multiprocessing (MPI)
*GFLOP/s: billion floating point operations per second
Hardware: Itanium2 Cluster
schooner.oscer.ou.edu
New arrival!
Vector Processing
It is based on array processors where the instruction set includes operations that can perform mathematical operations on data elements simultaneously
Example: Finding Scalar dot product between two vectors
Is vector processing a parallel computing model?
What are the limitations of Vector processing ?
Extensively in Video processing & Games...
Pipelined Processing
The fundamental idea is to split the processing of a computer instruction into a series of independent steps, with storage at the end of each step.
This allows the computer's control circuitry to issue instructions at the processing rate of the slowest step, which is much faster than the time needed to perform all steps at once.
A non-pipeline architecture is inefficient because some CPU components (modules) are idle while another module is active during the instruction cycle
Processors with pipelining are organized inside into stages which can semi-independently work on separate jobs
Parallel Vs Pipelined Processing
Parallel processing
Pipelined processing
a1a2a3a4b1b2b3b4c1c2c3c4d1d2d3d4a1b1c1d1a2b2c2d2a3b3c3d3a4b4c4d4P1
P2
P3
P4P1
P2
P3
P4
timeColors: different types of operations performeda, b, c, d: different data streams processedLess inter-processor communicationComplicated processor hardware
timeMore inter-processor communicationSimpler processor hardware
Data Dependence
Parallel processing requires NO data dependence between processors
Pipelined processing will involve inter-processor communication
P1
P2
P3
P4P1
P2
P3
P4
time
time
PPPPPPMicrokernelMulti-Processor Computing SystemThreads InterfaceHardwareOperating SystemProcess
Processor
ThreadP
Applications
Typical Computing Elements
Programming paradigms
Why Parallel Processing ?
Computation requirements are ever increasing; for instance -- visualization, distributed databases, simulations, scientific prediction (ex: climate, earthquake), etc.
Sequential architectures reaching physical limitation (speed of light, thermodynamics)
Limit on number of transistor per square inch
Limit on inter-component link capacitance
Symmetric Multiprocessing SMP
Involves a multiprocessor computer architecture where two or more identical processors can connect to a single shared main memory
Kernel can execute on any processor
Typically each processor does self-scheduling form the pool of available process or threads
Scalability problems in Uniform Memory Access
NUMA to improve speed, but limitations on data migration
Intel, AMD processors are SMP units
What is ASMP ?
SISD : A Conventional Computer
Speed is limited by the rate at which computer can transfer information internally.
Processor
Data InputData OutputInstructionsEx:PC, Macintosh, Workstations
The MISD Architecture
More of an intellectual exercise than a practical configuration. Few built, but commercially not available
Data InputStreamData OutputStreamProcessorAProcessorBProcessorCInstructionStream AInstructionStream BInstruction Stream C
SIMD Architecture
Ex: CRAY machine vector processing, Intel MMX (multimedia support)
Ci