Distributed Computing

download Distributed Computing

If you can't read please download the document

Transcript of Distributed Computing

Introducing a New Product

Distributed Computing

Sudarsun Santhiappansudarsun@{burning-glass.com, gmail.com}Burning Glass TechnologiesKilpauk, Chennai 600010

Technology is Changing...

Computational Power gets Doubled every 18 months

Networking Bandwidth and Speed getting Doubled every 9 months

How to tap the benefits of this Technology ?

Should we grow as an Individual ?

Should we grow as a Team ?

The Coverage Today

Parallel Processing

Multiprocessor or Multi-Core Computing

Symmetric Multiprocessing

Cluster Computing {PVM}

Distributed Computing {TAO, OpenMP}

Grid Computing {Globus Toolkit}

Cloud Computing {Amazon EC2}

Parallel Computing

It is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently in parallel.

Multi-Core, Multiprocessor SMP, Massively Parallel Processing (MPP) Computers

Is it easy to write a parallel program ?

Cluster Computing

A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer

Operate in shared memory mode (mostly)

Tightly coupled with high-speed networking, mostly with optical fiber channels.

HA, Load Balancing, Compute Clusters

Can we Load Balance using DNS ?

Distributed Computing

Wikipedia: It deals with hardware and software systems containing more than one processing element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly controlled regime

Grid Computing

Wikipedia: A form of distributed computing whereby a super and virtual computer is composed of a cluster of networked, loosely-coupled computers, acting in concert to perform large tasks.

pcwebopedia.com: Unlike conventional networks that focus on communication among devices, grid computing harnesses unused processing cycles of all computers in a network for solving problems too intensive for any stand-alone machine.

IBM: Grid computing enables the virtualization of distributed computing and data resources such as processing, network bandwidth and storage capacity to create a single system image, granting users and applications seamless access to vast IT capabilities. Just as an Internet user views a unified instance of content via the Web, a grid user essentially sees a single, large virtual computer.

Sun: Grid Computing is a computing infrastructure that provides dependable, consistent, pervasive and inexpensive access to computational capabilities.

Cloud Computing

Wikipedia: It is a style of computing in which dynamically stable and often virtualised resources are provided as a service over the Internet.

Infrastructure As A Service (IaaS)

Platform As A Service (PaaS)

Software as a Service (SaaS)

Provide common business applications online accessible from a web browser.

Amazon Elastic Computing, Google Apps

Hardware: IBM p690 Regatta

32 POWER4 CPUs (1.1 GHz)32 GB RAM218 GB internal diskOS: AIX 5.1Peak speed: 140.8 GFLOP/s*Programming model: shared memory multithreading (OpenMP) (also supports MPI)*GFLOP/s: billion floating point operations per second

270 Pentium4 XeonDP CPUs270 GB RAM8,700 GB diskOS: Red Hat Linux Enterprise 3Peak speed: 1.08 TFLOP/s*Programming model: distributed multiprocessing (MPI)

*TFLOP/s: trillion floating point operations per second

Hardware: Pentium4 Xeon Cluster

56 Itanium2 1.0 GHz CPUs112 GB RAM5,774 GB diskOS: Red Hat Linux Enterprise 3Peak speed: 224 GFLOP/s*Programming model: distributed multiprocessing (MPI)

*GFLOP/s: billion floating point operations per second

Hardware: Itanium2 Cluster

schooner.oscer.ou.edu

New arrival!

Vector Processing

It is based on array processors where the instruction set includes operations that can perform mathematical operations on data elements simultaneously

Example: Finding Scalar dot product between two vectors

Is vector processing a parallel computing model?

What are the limitations of Vector processing ?

Extensively in Video processing & Games...

Pipelined Processing

The fundamental idea is to split the processing of a computer instruction into a series of independent steps, with storage at the end of each step.

This allows the computer's control circuitry to issue instructions at the processing rate of the slowest step, which is much faster than the time needed to perform all steps at once.

A non-pipeline architecture is inefficient because some CPU components (modules) are idle while another module is active during the instruction cycle

Processors with pipelining are organized inside into stages which can semi-independently work on separate jobs

Parallel Vs Pipelined Processing

Parallel processing

Pipelined processing

a1a2a3a4b1b2b3b4c1c2c3c4d1d2d3d4a1b1c1d1a2b2c2d2a3b3c3d3a4b4c4d4P1

P2

P3

P4P1

P2

P3

P4

timeColors: different types of operations performeda, b, c, d: different data streams processedLess inter-processor communicationComplicated processor hardware

timeMore inter-processor communicationSimpler processor hardware

Data Dependence

Parallel processing requires NO data dependence between processors

Pipelined processing will involve inter-processor communication

P1

P2

P3

P4P1

P2

P3

P4

time

time

PPPPPPMicrokernelMulti-Processor Computing SystemThreads InterfaceHardwareOperating SystemProcess

Processor

ThreadP

Applications

Typical Computing Elements

Programming paradigms

Why Parallel Processing ?

Computation requirements are ever increasing; for instance -- visualization, distributed databases, simulations, scientific prediction (ex: climate, earthquake), etc.

Sequential architectures reaching physical limitation (speed of light, thermodynamics)

Limit on number of transistor per square inch

Limit on inter-component link capacitance

Symmetric Multiprocessing SMP

Involves a multiprocessor computer architecture where two or more identical processors can connect to a single shared main memory

Kernel can execute on any processor

Typically each processor does self-scheduling form the pool of available process or threads

Scalability problems in Uniform Memory Access

NUMA to improve speed, but limitations on data migration

Intel, AMD processors are SMP units

What is ASMP ?

SISD : A Conventional Computer

Speed is limited by the rate at which computer can transfer information internally.

Processor

Data InputData OutputInstructionsEx:PC, Macintosh, Workstations

The MISD Architecture

More of an intellectual exercise than a practical configuration. Few built, but commercially not available

Data InputStreamData OutputStreamProcessorAProcessorBProcessorCInstructionStream AInstructionStream BInstruction Stream C

SIMD Architecture

Ex: CRAY machine vector processing, Intel MMX (multimedia support)

Ci