Parallel Programming
description
Transcript of Parallel Programming
![Page 1: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/1.jpg)
1
Parallel Programming
Aaron Bloomfield
CS 415
Fall 2005
![Page 2: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/2.jpg)
2
Why Parallel Programming?
• Predict weather• Predict spread of SARS• Predict path of hurricanes• Predict oil slick propagation• Model growth of bio-plankton/fisheries• Structural simulations• Predict path of forest fires• Model formation of galaxies• Simulate nuclear explosions
![Page 3: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/3.jpg)
3
Code that can be parallelized
do i= 1 to max,
a[i] = b[i] + c[i] * d[i]
end do
![Page 4: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/4.jpg)
4
Parallel Computers
• Programming mode types– Shared memory– Message passing
![Page 5: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/5.jpg)
5
Distributed Memory Architecture
• Each Processor has direct access only to its local memory• Processors are connected via high-speed interconnect• Data structures must be distributed• Data exchange is done via explicit processor-to-processor
communication: send/receive messages• Programming Models
– Widely used standard: MPI– Others: PVM, Express, P4, Chameleon, PARMACS, ...
P0Communication Interconnect
...
Memory
Memory
Memory
P0 P1 Pn
![Page 6: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/6.jpg)
6
Message Passing Interface
MPI provides:• Point-to-point communication• Collective operations
– Barrier synchronization– gather/scatter operations– Broadcast, reductions
• Different communication modes– Synchronous/asynchronous– Blocking/non-blocking– Buffered/unbuffered
• Predefined and derived datatypes• Virtual topologies• Parallel I/O (MPI 2)• C/C++ and Fortran bindings
• http://www.mpi-forum.org
![Page 7: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/7.jpg)
7
Shared Memory Architecture• Processors have direct access to global memory and I/O
through bus or fast switching network• Cache Coherency Protocol guarantees consistency
of memory and I/O accesses• Each processor also has its own memory (cache)• Data structures are shared in global address space• Concurrent access to shared memory must be coordinated• Programming Models
– Multithreading (Thread Libraries)– OpenMP P0
CacheP0
CacheP1
CachePn
Cache
Global Shared Memory
Shared Bus
...
![Page 8: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/8.jpg)
8
OpenMP
• OpenMP: portable shared memory parallelism• Higher-level API for writing portable multithreaded
applications• Provides a set of compiler directives and library routines
for parallel application programmers• API bindings for Fortran, C, and C++
http://www.OpenMP.org
![Page 9: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/9.jpg)
9
![Page 10: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/10.jpg)
10
Approaches
• Parallel Algorithms
• Parallel Language
• Message passing (low-level)
• Parallelizing compilers
![Page 11: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/11.jpg)
11
Parallel Languages
• CSP - Hoare’s notation for parallelism as a network of sequential processes exchanging messages.
• Occam - Real language based on CSP. Used for the transputer, in Europe.
![Page 12: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/12.jpg)
12
Fortran for parallelism
• Fortran 90 - Array language. Triplet notation for array sections. Operations and intrinsic functions possible on array sections.
• High Performance Fortran (HPF) - Similar to Fortran 90, but includes data layout specifications to help the compiler generate efficient code.
![Page 13: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/13.jpg)
13
More parallel languages
• ZPL - array-based language at UW. Compiles into C code (highly portable).
• C* - C extended for parallelism
![Page 14: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/14.jpg)
14
Object-Oriented
• Concurrent Smalltalk
• Threads in Java, Ada, thread libraries for use in C/C++– This uses a library of parallel routines
![Page 15: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/15.jpg)
15
Functional
• NESL, Multiplisp
• Id & Sisal (more dataflow)
![Page 16: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/16.jpg)
16
Parallelizing Compilers
Automatically transform a sequential program into a parallel program.
1. Identify loops whose iterations can be executed in parallel.
2. Often done in stages.
Q: Which loops can be run in parallel?
Q: How should we distribute the work/data?
![Page 17: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/17.jpg)
17
Data Dependences
Flow dependence - RAW. Read-After-Write. A "true" dependence. Read a value after it has been written into a variable.
Anti-dependence - WAR. Write-After-Read. Write a new value into a variable after the old value has been read.
Output dependence - WAW. Write-After-Write. Write a new value into a variable and then later on write another value into the same variable.
![Page 18: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/18.jpg)
18
Example
1: A = 90;
2: B = A;
3: C = A + D
4: A = 5;
![Page 19: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/19.jpg)
19
Dependencies
A parallelizing compiler must identify loops that do not have dependences BETWEEN ITERATIONS of the loop.
Example:
do I = 1, 1000 A(I) = B(I) + C(I) D(I) = A(I)end do
![Page 20: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/20.jpg)
20
Example
Fork one thread for each processor
Each thread executes the loop:
do I = my_lo, my_hi
A(I) = B(I) + C(I)
D(I) = A(I)
end do
Wait for all threads to finish before proceeding.
![Page 21: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/21.jpg)
21
Another Example
do I = 1, 1000
A(I) = B(I) + C(I)
D(I) = A(I+1)
end do
![Page 22: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/22.jpg)
22
Yet Another Example
do I = 1, 1000
A( X(I) ) = B(I) + C(I)
D(I) = A( X(I) )
end do
![Page 23: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/23.jpg)
23
Parallel Compilers
• Two concerns:
• Parallelizing code– Compiler will move code around to uncover
parallel operations
• Data locality– If a parallel operation has to get data from
another processor’s memory, that’s bad
![Page 24: Parallel Programming](https://reader035.fdocuments.us/reader035/viewer/2022062322/568148f3550346895db6104d/html5/thumbnails/24.jpg)
24
Distributed computing
• Take a big task that has natural parallelism• Split it up to may different computers across a
network
• Examples: SETI@Home, prime number searches, Google Compute, etc.
• Distributed computing is a form of parallel computing