Computational Grids. Computational Problems Problems that have lots of computations and usually lots...
-
date post
20-Dec-2015 -
Category
Documents
-
view
217 -
download
2
Transcript of Computational Grids. Computational Problems Problems that have lots of computations and usually lots...
![Page 1: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/1.jpg)
Computational Grids
![Page 2: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/2.jpg)
Computational Problems
• Problems that have lots of computations and usually lots of data.
![Page 3: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/3.jpg)
Demand for Computational Speed
• Continual demand for greater computational speed from a computer system than is currently possible
• Areas requiring great computational speed include numerical modeling and simulation of scientific and engineering problems.
• Computations must be completed within a “reasonable” time period.
![Page 4: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/4.jpg)
Grand Challenge Problems
One that cannot be solved in a reasonable amount of time with today’s computers. Obviously, an execution time of 10 years is always unreasonable.
Examples
• Modeling large DNA structures• Global weather forecasting• Modeling motion of astronomical bodies.
![Page 5: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/5.jpg)
Weather Forecasting
• Atmosphere modeled by dividing it into 3-dimensional cells.
• Calculations of each cell repeated many times to model passage of time.
![Page 6: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/6.jpg)
Global Weather Forecasting Example• Suppose whole global atmosphere divided into cells of size
1 mile 1 mile 1 mile to a height of 10 miles (10 cells high) - about 5 108 cells.
• Suppose each calculation requires 200 floating point operations. In one time step, 1011 floating point operations necessary.
• To forecast the weather over 7 days using 1-minute intervals, a computer operating at 1Gflops (109 floating point operations/s) takes 106 seconds or over 10 days.
• To perform calculation in 5 minutes requires computer operating at 3.4 Tflops (3.4 1012 floating point operations/sec).
![Page 7: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/7.jpg)
Modeling Motion of Astronomical Bodies
• Each body attracted to each other body by gravitational forces. Movement of each body predicted by calculating total force on each body.
• With N bodies, N - 1 forces to calculate for each
body, or approx. N2 calculations. (N log2 N for an efficient approx. algorithm.)
• After determining new positions of bodies, calculations repeated.
![Page 8: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/8.jpg)
• A galaxy might have, say, 1011 stars.
• Even if each calculation done in 1 ms (extremely optimistic figure), it takes 109 years for one iteration using N2 algorithm
and
almost a year for one iteration using an efficient N log2 N approximate algorithm.
![Page 9: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/9.jpg)
Astrophysical N-body simulation by Scott Linssen (undergraduate UNC-Charlotte student).
![Page 10: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/10.jpg)
High Performance Computing (HPC)
• Traditionally, achieved by using the multiple computers together - parallel computing.
• Simple idea! -- Using multiple computers (or processors) simultaneously should be able can solve the problem faster than a single computer.
![Page 11: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/11.jpg)
High Performance Computing
• Long History:– Multiprocessor system of various
types (1950’s onwards)
– Supercomputers (1960s-80’s)
– Cluster computing (1990’s)
– Grid computing (2000’s) ??Maybe, but let’s first look at how to achieve HPC.
![Page 12: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/12.jpg)
Speedup Factor
where ts is execution time on a single processor and tp is execution time on a multiprocessor.
S(p) gives increase in speed by using multiprocessor.
Use best sequential algorithm with single processor system. Underlying algorithm for parallel implementation might be (and is usually) different.
S(p) = Execution time using one processor (best sequential algorithm)
Execution time using a multiprocessor with p processors
ts
tp
=
![Page 13: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/13.jpg)
Maximum Speedup
Maximum speedup is usually p with p processors (linear speedup).
Possible to get superlinear speedup (greater than p) but usually a specific reason such as:
• Extra memory in multiprocessor system• Nondeterministic algorithm
![Page 14: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/14.jpg)
Maximum Speedup Amdahl’s law
Serial section Parallelizable sections
(a) One processor
(b) Multipleprocessors
fts (1 - f)ts
ts
(1 - f)ts/ptp
p processors
![Page 15: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/15.jpg)
Speedup factor is given by:
This equation is known as Amdahl’s law
S(p) ts p
fts (1 f )ts /p 1 (p 1)f
![Page 16: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/16.jpg)
Speedup against number of processors
4
8
12
16
20
4 8 12 16 20
f = 20%
f = 10%
f = 5%
f = 0%
Number of processors, p
Speedup factor, S(p)
![Page 17: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/17.jpg)
Even with infinite number of processors, maximum speedup limited to 1/f.
ExampleWith only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors.
![Page 18: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/18.jpg)
Superlinear SpeedupExample - Searching
(a) Searching each sub-space sequentially
t s
t s/p
Start Time
t
Solution foundx ts /p
Sub-space
search
x indeterminate
![Page 19: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/19.jpg)
(b) Searching each sub-space in parallel
Solution found
t
![Page 20: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/20.jpg)
Question
What is the speed-up now?
![Page 21: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/21.jpg)
Speed-up then given by
S(p)x
tsp
t+
t=
![Page 22: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/22.jpg)
Worst case for sequential search when solution found in last sub-space search.Then parallel version offers greatest benefit, i.e.
S(p)
p 1–p
ts t+
t=
as t tends to zero
![Page 23: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/23.jpg)
Least advantage for parallel version when solution found in first sub-space search of the sequential search, i.e.
Actual speed-up depends upon which subspace holds solution but could be extremely large.
S(p) = tt
= 1
![Page 24: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/24.jpg)
Computing Platforms for Parallel Programming
![Page 25: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/25.jpg)
Types of Parallel Computers
Two principal types:
1. Single computer containing multiple processors - main memory is shared, hence called “Shared memory multiprocessor”
2. Interconnected multiple computer systems
![Page 26: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/26.jpg)
Conventional ComputerConsists of a processor executing a program stored in a (main) memory:
Each main memory location located by its address. Addresses start at 0 and extend to 2b - 1 when there are b bits (binary digits) in address.
Main memory
Processor
Instructions (to processor)Data (to or from processor)
![Page 27: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/27.jpg)
Shared Memory Multiprocessor• Extend single processor model - multiple
processors connected to a single shared memory with a single address space:
Memory
Processors
A real system will have cache memory associated with each processor
![Page 28: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/28.jpg)
Examples
• Dual Pentiums
• Quad Pentiums
![Page 29: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/29.jpg)
Quad Pentium Shared Memory MultiprocessorProcessor
L2 Cache
Bus interface
L1 cache
Processor
L2 Cache
Bus interface
L1 cache
Processor
L2 Cache
Bus interface
L1 cache
Processor
L2 Cache
Bus interface
L1 cache
Memory controller
Memory
I/O interface
I/O bus
Processor/memorybus
Shared memory
![Page 30: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/30.jpg)
Programming Shared Memory Multiprocessors
• Threads - programmer decomposes program into parallel sequences (threads), each being able to access variables declared outside threads.
Example: Pthreads
• Use sequential programming language with preprocessor compiler directives, constructs, or syntax to declare shared variables and specify parallelism. Examples: OpenMP (an industry standard), UPC
(Unified Parallel C) -- needs compilers.
![Page 31: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/31.jpg)
• Parallel programming language with syntax to express parallelism. Compiler creates executable code -- not now common.
• Use parallelizing compiler to convert regular sequential language programs into parallel executable code - also not now common.
![Page 32: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/32.jpg)
Message-Passing Multicomputer
Complete computers connected through an interconnection network:
Processor
Interconnectionnetwork
Local
Computers
Messages
memory
![Page 33: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/33.jpg)
Dedicated cluster with a master node
UserExternal network
Master node
Compute nodes
Switch
2nd Ethernet interface
Ethernet interface
Cluster
![Page 34: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/34.jpg)
UNC-C’s cluster used for grid course(Department of Computer Science)
P P
M
P P
M
P P
M
P P
M
coit-grid01 coit-grid02 coit-grid03 coit-grid04
3.4 GHz dual Xeon Pentiums
To External network Switch
Funding for this cluster provided by the University of North Carolina, Office of the President, specificially for the grid computing course.
![Page 35: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/35.jpg)
Programming Clusters
• Usually based upon explicit message-passing.
• Common approach -- a set of user-level libraries for message passing. Example:– Parallel Virtual Machine (PVM) - late 1980’s.
Became very popular in mid 1990’s. – Message-Passing Interface (MPI) - standard
defined in 1990’s and now dominant.
![Page 36: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/36.jpg)
MPI(Message Passing Interface)
• Message passing library standard developed by group of academics and industrial partners to foster more widespread use and portability.
• Defines routines, not implementation.
• Several free implementations exist.
![Page 37: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/37.jpg)
MPI designed:
• To address some problems with earlier message-passing system such as PVM.
• To provide powerful message-passing mechanism and routines - over 126 routines(although it is said that one can write reasonable MPI
programs with just 6 MPI routines).
![Page 38: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/38.jpg)
Message-Passing Programming using User-level Message Passing Libraries
Two primary mechanisms needed:
1. A method of creating separate processes for execution on different computers
2. A method of sending and receiving messages
![Page 39: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/39.jpg)
Multiple program, multiple data (MPMD) model
Sourcefile
Executable
Processor 0 Processor p - 1
Compile to suitprocessor
Sourcefile
![Page 40: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/40.jpg)
Single Program Multiple Data (SPMD) model.
Basic MPI way
Sourcefile
Executables
Processor 0 Processor p - 1
Compile to suitprocessor
Different processes merged into one program. Control statements select different parts for each processor to execute.
![Page 41: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/41.jpg)
Multiple Program Multiple Data (MPMD) Model
Process 1
Process 2spawn();
Time
Start executionof process 2
Separate programs for each processor. One processor executes master process. Other processes started from within master process - dynamic process creation.
Can be done with MPI version 2
![Page 42: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/42.jpg)
Communicators• Defines scope of a communication operation.
• Processes have ranks associated with communicator.
• Initially, all processes enrolled in a “universe” called MPI_COMM_WORLD, and each process is given a unique rank, a number from 0 to p - 1, with p processes.
• Other communicators can be established for groups of processes.
![Page 43: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/43.jpg)
Using SPMD Computational Model
main (int argc, char *argv[]) {MPI_Init(&argc, &argv);
.
.MPI_Comm_rank(MPI_COMM_WORLD,&myrank); /*find rank */
if (myrank == 0)master();
elseslave();..
MPI_Finalize();}
where master() and slave() are to be executed by master process and slave process, respectively.
![Page 44: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/44.jpg)
Basic “point-to-point”Send and Receive Routines
Process 1 Process 2
send(&x, 2);
recv(&y, 1);
x y
Movementof data
Generic syntax (actual formats later)
Passing a message between processes using send() and recv() library calls:
![Page 45: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/45.jpg)
Message Tag
• Used to differentiate between different types of messages being sent.
• Message tag is carried within message.
• If special type matching is not required, a wild card message tag is used, so that the recv() will match with any send().
![Page 46: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/46.jpg)
Message Tag Example
Process 1 Process 2
send(&x,2, 5);
recv(&y,1, 5);
x y
Movementof data
Waits for a message from process 1 with a tag of 5
To send a message, x, with message tag 5 from a source process, 1, to a destination process, 2, and assign to y:
![Page 47: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/47.jpg)
Synchronous Message Passing
Routines return when message transfer completed.
Synchronous send routine• Waits until complete message can be accepted by
the receiving process before sending the message.
Synchronous receive routine• Waits until the message it is expecting arrives.
![Page 48: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/48.jpg)
Synchronous send() and recv() using 3-way protocol
Process 1 Process 2
send();
recv();Suspend
Time
processAcknowledgment
MessageBoth processescontinue
(a) When send() occurs before recv()
Request to send
![Page 49: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/49.jpg)
Synchronous send() and recv() using 3-way protocol
Process 1 Process 2
recv();
send();Suspend
Time
process
Acknowledgment
MessageBoth processescontinue
(b) When recv() occurs before send()
Request to send
![Page 50: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/50.jpg)
• Synchronous routines intrinsically perform two actions:– They transfer data and – They synchronize processes.
![Page 51: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/51.jpg)
Asynchronous Message Passing
• Routines that do not wait for actions to complete before returning. Usually require local storage for messages.
• More than one version depending upon the actual semantics for returning.
• In general, they do not synchronize processes but allow processes to move forward sooner. Must be used with care.
![Page 52: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/52.jpg)
MPI Blocking and Non-Blocking
• Blocking - return after their local actions complete, though the message transfer may not have been completed.
• Non-blocking - return immediately. Assumes that data storage used for transfer not modified by subsequent statements prior to being used for transfer, and it is left to the programmer to ensure this.
These terms may have different interpretations in other systems.
![Page 53: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/53.jpg)
How message-passing routines return before message transfer completed
Process 1 Process 2
send();
recv();
Message buffer
Readmessage buffer
Continueprocess
Time
Message buffer needed between source and destination to hold message:
![Page 54: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/54.jpg)
Asynchronous routines changing to synchronous routines
• Buffers only of finite length and a point could be reached when send routine held up because all available buffer space exhausted.
• Then, send routine will wait until storage becomes re-available - i.e then routine behaves as a synchronous routine.
![Page 55: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/55.jpg)
Parameters of MPI blocking send
MPI_Send(buf, count, datatype, dest, tag, comm)
Address of send buffer
Number of items to send
Datatype of each item
Rank of destination process
Message tag
Communicator
![Page 56: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/56.jpg)
Parameters of MPI blocking receive
MPI_Recv(buf,count,datatype,dest,tag,comm,status)
Address of receive buffer Max. number of
items to receive
Datatype of each item
Rank of source process
Message tag
Communicator
Status after operation
![Page 57: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/57.jpg)
Example
To send an integer x from process 0 to process 1,
MPI_Comm_rank(MPI_COMM_WORLD,&myrank); /* find rank */
if (myrank == 0) {int x;MPI_Send(&x,1,MPI_INT,1,msgtag,MPI_COMM_WORLD);
} else if (myrank == 1) {int x;MPI_Recv(&x,1,MPI_INT,0,msgtag,MPI_COMM_WORLD,status);
}
![Page 58: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/58.jpg)
MPI Nonblocking Routines
• Nonblocking send - MPI_Isend() - will return “immediately” even before source location is safe to be altered.
• Nonblocking receive - MPI_Irecv() - will return even if no message to accept.
![Page 59: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/59.jpg)
Detecting when message receive if sent with non-blocking send routine
Completion detected by MPI_Wait() and MPI_Test().
MPI_Wait() waits until operation completed and returns then.
MPI_Test() returns with flag set indicating whether operation completed at that time.
Need to know which particular send you are waiting for.
Identified with request parameter.
![Page 60: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/60.jpg)
Example
To send an integer x from process 0 to process 1 and allow process 0 to continue,
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find rank */if (myrank == 0) {
int x;MPI_Isend(&x,1,MPI_INT,1,msgtag,MPI_COMM_WORLD, req1);compute();MPI_Wait(req1, status);
} else if (myrank == 1) {int x;MPI_Recv(&x,1,MPI_INT,0,msgtag, MPI_COMM_WORLD, status);
}
![Page 61: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/61.jpg)
“Group” message passing routines
Have routines that send message(s) to a group of processes or receive message(s) from a group of processes
Higher efficiency than separate point-to-point routines although not absolutely necessary.
![Page 62: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/62.jpg)
BroadcastSending same message to a group of processes.(Sometimes “Multicast” - sending same message to defined group of processes, “Broadcast” - to all processes.)
MPI_bcast();
buf
MPI_bcast();
data
MPI_bcast();
datadata
Process 0 Process p - 1Process 1
Action
Code
MPI form
![Page 63: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/63.jpg)
MPI Broadcast routine
int MPI_Bcast(void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm)
Actions:Broadcasts message from root process to all processes in comm and itself.
Parameters:*buf message buffercount number of entries in bufferdatatype data type of bufferroot rank of root
![Page 64: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/64.jpg)
Scatter
MPI_scatter();
buf
MPI_scatter();
data
MPI_scatter();
datadata
Process 0 Process p - 1Process 1
Action
Code
MPI form
Sending each element of an array in root process to a separate process. Contents of ith location of array sent to ith process.
![Page 65: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/65.jpg)
Gather
MPI_gather();
buf
MPI_gather();
data
MPI_gather();
datadata
Process 0 Process p - 1Process 1
Action
Code
MPI form
Having one process collect individual values from set of processes.
![Page 66: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/66.jpg)
Reduce
MPI_reduce();
buf
MPI_reduce();
data
MPI_reduce();
datadata
Process 0 Process p - 1Process 1
+
Action
Code
MPI form
Gather operation combined with specified arithmetic/logical operation.
Example: Values could be gathered and then added together by root:
![Page 67: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/67.jpg)
Collective Communication
Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations:
• MPI_Bcast() - Broadcast from root to all other processes• MPI_Gather() - Gather values for group of processes• MPI_Scatter() - Scatters buffer in parts to group of processes• MPI_Alltoall() - Sends data from all processes to all
processes• MPI_Reduce() - Combine values on all processes to single
value• MPI_Reduce_scatter() - Combine values and scatter results• MPI_Scan() - Compute prefix reductions of data on processes
![Page 68: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/68.jpg)
ExampleTo gather items from group of processes into process 0, using dynamically allocated memory in root process:
int data[10];/*data to be gathered from processes*/
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find rank */if (myrank == 0) {MPI_Comm_size(MPI_COMM_WORLD,&grp_size);/*find size*/
/*allocate memory*/buf = (int *)malloc(grp_size*10*sizeof (int));
}MPI_Gather(data,10,MPI_INT,buf,grp_size*10,MPI_INT,0, MPI_COMM_WORLD);
MPI_Gather() gathers from all processes, including root.
![Page 69: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/69.jpg)
Sample MPI program
#include “mpi.h”#include <stdio.h>#include <math.h>#define MAXSIZE 1000void main(int argc, char *argv) {
int myid, numprocs;int data[MAXSIZE], i, x, low, high, myresult, result;char fn[255];char *fp;MPI_Init(&argc,&argv);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Comm_rank(MPI_COMM_WORLD,&myid);if (myid == 0) { /* Open input file and initialize data */
strcpy(fn,getenv(“HOME”));strcat(fn,”/MPI/rand_data.txt”);if ((fp = fopen(fn,”r”)) == NULL) {
printf(“Can’t open the input file: %s\n\n”, fn);exit(1);
}for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]);
}MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* broadcast data */x = n/nproc; /* Add my portion Of data */low = myid * x;high = low + x;for(i = low; i < high; i++)
myresult += data[i];printf(“I got %d from %d\n”, myresult, myid); /* Compute global sum */MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);if (myid == 0) printf(“The sum is %d.\n”, result);MPI_Finalize();
}
![Page 70: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/70.jpg)
Debugging/Evaluating Parallel Programs Empirically
![Page 71: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/71.jpg)
Visualization ToolsPrograms can be watched as they are executed in a space-time diagram (or process-time diagram):
Process 1
Process 2
Process 3
TimeComputingWaitingMessage-passing system routine
Message
Visualization tools available for MPI. An example - Upshot
![Page 72: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/72.jpg)
Evaluating Programs EmpiricallyMeasuring Execution Time
To measure the execution time between point L1 and point L2 in the code, we might have a construction such as
.
t1 = MPI_Wtime(); /* start */..
t2 = MPI_Wtime(); /* end */.
elapsed_time = t2 - t1); /*elapsed_time */printf(“Elapsed time = %5.2f seconds”, elapsed_time);
MPI provides the routine MPI_Wtime() for returning time (in seconds).
![Page 73: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/73.jpg)
Executing MPI programs
• MPI version 1 standard does not address implementation and did not specify how programs are to be started and each implementation has its own way.
![Page 74: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/74.jpg)
Compiling/Executing MPI Programs
Basics
For MPICH, use two commands:
• mpicc to compile a program
• mirun to execute program
![Page 75: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/75.jpg)
mpicc
Example
mpicc –o hello hello.c
compiles hello.c to create the executable hello.
mpicc is (probably) a script calling cc and hence all regular cc flags can be attached.
![Page 76: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/76.jpg)
mpirun
Example
mpirun –np 3 hello
executes 3 instances of hello on the local machine (when using MPICH).
![Page 77: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/77.jpg)
Using multiple computersFirst create a file (say called “machines”) containing list of computers you what to use.
Examplecoit-1grid01.uncc.educoit-2grid01.uncc.educoit-3grid01.uncc.educoit-4grid01.uncc.edu
![Page 78: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/78.jpg)
Then specify machines file in mpirun command:
Example
mpirun –np 3 -machinefile machines hello
executes 3 instances of hello using the computers listed in the file. (Scheduling will be round-robin unless otherwise specified.)
![Page 79: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/79.jpg)
MPI-2
• The MPI standard, version 2 does recommend a command for starting MPI programs, namely:
mpiexec -n # prog
where # is the number of processes and prog is the program.
![Page 80: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/80.jpg)
Sample MPI Programs
![Page 81: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/81.jpg)
Hello WorldPrinting out rank of process
#include "mpi.h"#include <stdio.h>int main(int argc,char *argv[]) {
int myrank, numprocs;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD,&myrank);MPI_Comm_size(MPI_COMM_WORLD,&numprocs)printf("Hello World from process %d of %d\n", myrank, numprocs);MPI_Finalize();return 0;
}
![Page 82: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/82.jpg)
Question
Suppose this program is compiled as helloworld and is executed on a single computer with the command:
mpirun -np 4 helloworld
What would the output be?
![Page 83: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/83.jpg)
Answer
Several possible outputs depending upon order processes are executed.
ExampleHello World from process 2 of 4Hello World from process 0 of 4Hello World form process 1 of 4Hello World form process 3 of 4
![Page 84: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/84.jpg)
Adding communication to get process 0 to print all messages:#include "mpi.h"#include <stdio.h>int main(int argc,char *argv[]) {
int myrank, numprocs;char greeting[80]; /* message sent from slaves to master */
MPI_Status status;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD,&myrank);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);sprintf(greeting,"Hello World from process %d of %d\n",rank,size);if (myrank == 0 ) { /* I am going print out everything */ printf("s\n",greeting); /* print greeting from proc 0 */ for (i = 1; i < numprocs; i++) { /* greetings in order */
MPI_Recv(geeting,sizeof(greeting),MPI_CHAR,i,1,MPI_COMM_WORLD, &status);printf(%s\n", greeting);
}} else {
MPI_Send(greeting,strlen(greeting)+1,MPI_CHAR,0,1, MPI_COMM_WORLD);
}MPI_Finalize();return 0;
}
![Page 85: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/85.jpg)
MPI_Get_processor_name()
Return name of processor executing code (and length of string). Arguments:
MPI_Get_processor_name(char *name,int *resultlen)
Example int namelen; char procname[MPI_MAX_PROCESSOR_NAME];
MPI_Get_processor_name(procname,&namelen);
returned in here
![Page 86: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/86.jpg)
Easy then to add name in greeting with:
sprintf(greeting,"Hello World from process %d of %d on $s\n", rank, size, procname);
![Page 87: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/87.jpg)
Pinging processes and timingMaster-slave structure
#include <mpi.h>void master(void); void slave(void);int main(int argc, char **argv){ int myrank; printf("This is my ping program\n"); MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); if (myrank == 0) { master(); } else { slave(); } MPI_Finalize(); return 0;}
![Page 88: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/88.jpg)
Master routinevoid master(void){
int x = 9;double starttime, endtime;MPI_Status status;printf("I am the master - Send me a message when you receive this number %d\n", x);starttime = MPI_Wtime();MPI_Send(&x,1,MPI_INT,1,1,MPI_COMM_WORLD);MPI_Recv(&x,1,MPI_INT,1,1,MPI_COMM_WORLD,&status);endtime = MPI_Wtime();printf("I am the master. I got this back %d \n", x);printf("That took %f seconds\n",endtime - starttime);
}
![Page 89: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/89.jpg)
Slave routinevoid slave(void){
int x;MPI_Status status;printf("I am the slave - working\n");MPI_Recv(&x,1,MPI_INT,0,1,MPI_COMM_WORLD,&status);printf("I am the slave. I got this %d \n", x);MPI_Send(&x, 1, MPI_INT, 0, 1, MPI_COMM_WORLD);
}
![Page 90: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/90.jpg)
Example using collective routines
MPI_Bcast()
MPI_Reduce()
Adding numbers in a file.
![Page 91: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/91.jpg)
#include “mpi.h”#include <stdio.h>#include <math.h>#define MAXSIZE 1000void main(int argc, char *argv){
int myid, numprocs;int data[MAXSIZE], i, x, low, high, myresult, result;char fn[255];char *fp;MPI_Init(&argc,&argv);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Comm_rank(MPI_COMM_WORLD,&myid);if (myid == 0) { /* Open input file and initialize data */
strcpy(fn,getenv(“HOME”));strcat(fn,”/MPI/rand_data.txt”);if ((fp = fopen(fn,”r”)) == NULL) {
printf(“Can’t open the input file: %s\n\n”, fn);exit(1);
}for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]);
}MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* broadcast data */x = n/nproc; /* Add my portion Of data */low = myid * x;high = low + x;for(i = low; i < high; i++)
myresult += data[i];printf(“I got %d from %d\n”, myresult, myid); /* Compute global sum */MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);if (myid == 0) printf(“The sum is %d.\n”, result);MPI_Finalize();
}
![Page 92: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/92.jpg)
C Program Command Line Arguments
A normal C program specifies command line arguments to be passed to main with:
int main(int argc, char *argv[])
where• argc is the argument count and • argv[] is an array of character pointers.
– First entry is a pointer to program name– Subsequent entries point to subsequent strings
on the command line.
![Page 93: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/93.jpg)
MPIC program command line arguments
• Implementations of MPI remove from the argv array any command line arguments used by the implementation.
• Note MPI_Init requires argc and argv (specified as addresses)
![Page 94: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/94.jpg)
ExampleGetting Command Line Argument
#include “mpi.h”#include <stdio.h>int main (int argc, char *argv[]) {int n;
/* get and convert character string argument to integer value /*
n = atoi(argv[1]);
return 0;}
![Page 95: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/95.jpg)
Executing MPI program with command line arguments
mpirun -np 2 myProg 56 123
argv[1] argv[2]
Remember these array elements hold pointers to the arguments.
argv[0]
Removed by MPI - probably
by MPI_Init()
![Page 96: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/96.jpg)
More Information on MPI
• Books: “Using MPI Portable Parallel Programming with the Message-Passing Interface 2nd ed.,” W. Gropp, E. Lusk, and A. Skjellum, The MIT Press,1999.
• MPICH: http://www.mcs.anl.gov/mpi
• LAM MPI: http://www.lam-mpi.org
![Page 97: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/97.jpg)
Parallel Programming Home Page
http://www.cs.uncc.edu/par_prog
Gives step-by-step instructions for compiling and executing programs, and other information.
![Page 98: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/98.jpg)
Grid-enabled MPI
![Page 99: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/99.jpg)
Several versions of MPI developed for a grid:
• MPICH-G, MPICH-G2
• PACXMPI
MPICH-G2 is based on MPICH and uses Globus.
![Page 100: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/100.jpg)
MPI code for the grid
No difference in code from regular MPI code.
Key aspect is MPI implementation:
• Communication methods
• Resource management
![Page 101: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/101.jpg)
Communication Methods
• Implementation should take into account whether messages are between processor on the same computer or processors on different computers on the network.
• Pack messages into less larger message, even if this requires more computations
![Page 102: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/102.jpg)
MPICH-G2
• Complete implementation of MPI
• Can use existing MPI programs on a grid without change
• Uses Globus to start tasks, etc.
• Version 2 a complete redesign from MPICH-G for Globus 2.2 or later.
![Page 103: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/103.jpg)
Compiling Application Program
As with regular MPI programs, compile on each machine you intend to use and make accessible to computers.
![Page 104: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/104.jpg)
Running an MPICH-G2 Programmpirun
• submits a Globus RSL script (Resource Specification Language Script) to launch application
• RSL script can be created by mpirun or you can write your own.
• RSL script gives powerful mechanism to specify different executables etc., but low level.
![Page 105: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/105.jpg)
mpirun(with it constructing RSL script)
• Use if want to launch a single executable on binary compatible machines with a shared file system.
• Requires a “machines” file - a list of computers to be used (and job managers)
![Page 106: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/106.jpg)
“Machines” file
• Computers listed by their Globus job manager service followed by optional maximum number of node (tasks) on that machine.
• If job manager omitted (i.e., just name of computer), will default to Globus job manager.
![Page 107: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/107.jpg)
Location of “machines” file
• mpirun command expects the “machines” file either in– the directory specified by -machinefile flag
– the current directory used to execute the mpirun command, or
– in <MPICH_INSTALL_PATH>/bin/machines
![Page 108: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/108.jpg)
Running MPI program
• Uses the same command line as a regular MPI program:
mpirun -np 25 my_prog
creates 25 tasks allocated on machines in “machines’ file in around robin fashion.
![Page 109: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/109.jpg)
ExampleWith the machines file containing:
“coit-0grid01.uncc.edu” 4“coit-grid02.uncc.edu” 5
and the command:
mpirun -np 10 myProg
the first 4 processes (jobs) would run on coit-grid01, the next 5 on coit-grid02, and the remaining one on coit-grid01.
![Page 110: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/110.jpg)
mpirunwith your own RSL script
• Necessary if machines not executing same executable.
• Easiest way to create script is to modify existing one.
• Use mpirun –dumprsl– Causes script printed out. Application program
not launched.
![Page 111: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/111.jpg)
Example
mpirun -dumprsl -np 2 myprog
will generate appropriate printout of an rsl document according to the details of the job from the command line and machine file.
![Page 112: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/112.jpg)
Given rsl file, myRSL.rsl, use:
mpirun -globusrsl myRSL.rsl
to submit modified script.
![Page 113: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/113.jpg)
MPICH-G2 internals
• Processes allocated a “machine-local” number and a “grid global” number - translated into where process actually resides.
• Non-local operations uses grid services• Local operations do not.• globusrun command submits
simultaneous job requests
![Page 114: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/114.jpg)
Limitations
• “machines” file limits computers to those known - no discovery of resources
• If machines heterogeneous, need appropriate executables available, and RSL script
• Speed an issue - original version MPI-G slow.
![Page 115: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/115.jpg)
More information on MPICH-G2
http://www.niu.edu/mpi
http://www.globus.org/mpi
http://www.hpclab.niu.edu/mpi/g2_body.htm
![Page 116: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/116.jpg)
Parallel Programming Techniques
Suitable for a Grid
![Page 117: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/117.jpg)
Message-Passing on a Grid
• VERY expensive, sending data across network costs millions of cycles
• Bandwidth shared with other users
• Links unreliable
![Page 118: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/118.jpg)
Computational Strategies
• As a computing platform, a grid favors situations with absolute minimum communication between computers.
![Page 119: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/119.jpg)
StrategiesWith no/minimum communication:
• “Embarrassingly Parallel” Computations– those computations which obviously can be
divided into parallel independent parts. Parts executed on separate computers.
• Separate instance of the same problem executing on each system, each using different data
![Page 120: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/120.jpg)
Embarrassingly Parallel Computations
A computation that can obviously be divided into a number of completely independent parts, each of which can be executed by a separate process(or).
Processes
Results
Input data
No communication or very little communication between processes.Each process can do its tasks without any interaction with other processes
![Page 121: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/121.jpg)
Monte Carlo Methods
• An embarrassingly parallel computation.
• Monte Carlo methods use of random selections.
![Page 122: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/122.jpg)
Simple Example: To calculate Circle formed within a square, with radius of 1. Square has sides 2 2.
Area =
Total area = 4
2
2
![Page 123: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/123.jpg)
Ratio of area of circle to square given by
Area of circle = (1)2 = Area of square 2 x 2 4
• Points within square chosen randomly.• Score kept of how many points happen to
lie within circle.• Fraction of points within circle will be /4,
given a sufficient number of randomly selected samples.
![Page 124: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/124.jpg)
Method actually computes an integral.
One quadrant of the construction can be described by integral
1 x2– xd01
4---=
x
1
f(x)
1
1
y 1 x2–=
![Page 125: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/125.jpg)
So can use method to compute any integral!
Monte Carlo method very useful if the function cannot be integrated numerically (maybe having a large number of variables).
![Page 126: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/126.jpg)
Alternative (better) “Monte Carlo” Method Use random values of x to compute f (x) and sum values of f (x)
where xr are randomly generated values of x between x1 and x2.
y
x
Area f(x) xdx1
x2 1
N----
N lim f( xr)r
i 1=
N= = (x2 – x1)
X1X2
![Page 127: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/127.jpg)
ExampleComputing the integral
Sequential Code
sum = 0;for (i = 0; i < N; i++) { /* N random samples */
xr = rand_v(x1, x2); /* next random value */sum = sum + xr * xr - 3 * xr /* compute f(xr)*/
}area = (sum / N) * (x2 - x1);
randv(x1, x2) returns pseudorandom number between x1 and x2.
x1
x2 (x2 – 3x) dx
![Page 128: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/128.jpg)
For parallelizing Monte Carlo code, must address best way to generate random numbers in parallel.
Can use SPRNG (Scalable Pseudo-random Number Generator) -- supposed to be a good parallel random number generator.
![Page 129: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/129.jpg)
Executing separate problem instances
In some application areas, same program executed repeatedly - ideal if with different parameters (“parameter sweep”)
Nimrod/G -- a grid broker project that targets parameter sweep problems.
![Page 130: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/130.jpg)
Techniques to reduce effects of network communication
• Latency hiding with communication/computation overlap
• Better to have fewer larger messages than many smaller ones
![Page 131: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/131.jpg)
Synchronous Algorithms
• Many tradition parallel algorithms require the parallel processes to synchronize at regular and frequent intervals to exchange data and continue from known points
This is bad for grid computations!!All traditional parallel algorithms books have to
be thrown away for grid computing.
![Page 132: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/132.jpg)
Techniques to reduce actual synchronization communications
• Asynchronous algorithms – Algorithms that do not use synchronization
at all
• Partially synchronous algorithms– those that limit the synchronization, for
example only synchronize on every n iterations
– Actually such algorithms known for many years but not popularized.
![Page 133: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/133.jpg)
Big Problems“Grand challenge” problems
Most of the high profile projects on the grid involve problems that are so big usually in number of data items that they cannot be solved otherwise
![Page 134: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/134.jpg)
Examples
• High-energy physics
• Bioinformatics
• Medical databases
• Combinatorial chemistry
• Astrophysics
![Page 135: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/135.jpg)
Workflow Technique
• Use functional decomposition - dividing problem into separate functions which take results from other functions units and pass on results to functional units - interconnection patterns depends upon the problem.
• Workflow - describes the flow of information between the units.
![Page 136: Computational Grids. Computational Problems Problems that have lots of computations and usually lots of data.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d4d5503460f94a2c928/html5/thumbnails/136.jpg)
ExampleClimate Modeling
Atmospheric Atmospheric
Hydrology
Land Surface ModelOceanic Circulation
Atmospheric Model
ChemistryCirculation Model
Model
Ocean Model
Ocean Chemistry
heating rates
water vapor content, humidity , pressure,wind velocities, temperature
sea surf ace temperature
wind stress,heat flux,water flux