1 Lecture 2: Parallel computational models. 2 Turing machine RAM (Figure ) Logic circuit model...

16
1 Lecture 2: Parallel computational models

Transcript of 1 Lecture 2: Parallel computational models. 2 Turing machine RAM (Figure ) Logic circuit model...

Page 1: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

1

Lecture 2: Parallel computational models

Page 2: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

2

  Turing machine   RAM (Figure )   Logic circuit model

RAM (Random Access Machine)Operations supposed to be executed in one unit time

(1)   Control operations such as   if , goto( for and while can be realized by for and goto. )(2)   I/O operations such as   print (3)   Substitution operations such as   a = b (4)   Arithmetic and logic operations such as +, -, AND.

Control Unit (contains algorithms)

CPU Memory(unlimited size)

Data

Sequential computational models

Page 3: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

3

O-notation for computing complexity

Definition

Assume that f(n) is a positive function. If there are two positive constants c, n0 such that

f(n) c g(n) ≦ for all n n≧ 0,

then we say

f(n) = O( g(n) ).   For example,

•   3n2-5n = O(n2) •   n log n + n = O(n log n) •   45 = O(1)

( The item which grows most quickly) n

c g(n)

f(n)

n0

Page 4: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

4

  Sequential algorithms Parallel algorithms

Models   RAM Many types

Data division   Not necessary Most important

Analysis    Computing time Computing time

Memory size Communicating time

Number of processors

Algorithm analysis for sequential and parallel algorithms

Page 5: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

5

PRAM (Parallel RAM) model

PRAM consists of a number of RAM (Random Access Machine) and a shared memory. Each RAM has a unique processor number.

  Processors act synchronously.   Processor execute the same program.

( According to the condition fork based on processor numbers, it is

possible to executed different operations.)   Data communication between processors (RAMs)

are held through the shared common memory.   Each processor can write data to and read data

from one memory cell in O(1) time.

Shared Memory

RAM 1

RAM 2

RAM m

Page 6: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

6

Features of PRAM

Merits•   Parallelism of problems can be considered essentially. •   Algorithms an be easily designed. •   Algorithms can be changed easily to ones on other parallel computational models.

Demerits •   Communicational cost is not considered.

(It is not realistic that one synchronized reading and writing can be done in one unit time.)   

• Distributed memory is not considered.

In the following, We use PRAM to discuss parallel algorithms. In the following, We use PRAM to discuss parallel algorithms.

Page 7: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

7

Analysis of parallel algorithms on PRAM model

Computing time T(n) Number of processors P(n) Cost P(n) × T(n) Speed-up Ts(n)/T(n) Ts(n): Computing time of the optimal sequential algorithm )

1. Cost optimal parallel algorithms   The cost is the same as the computing time of the optimal sequential algorithm, i.e., speed-up is the same as the number of processors.2. Time optimal parallel algorithms   Fastest when using polynomial number of processors. 3. Optimal parallel algorithms  Cost and time optimal.

1. Cost optimal parallel algorithms   The cost is the same as the computing time of the optimal sequential algorithm, i.e., speed-up is the same as the number of processors.2. Time optimal parallel algorithms   Fastest when using polynomial number of processors. 3. Optimal parallel algorithms  Cost and time optimal.

Page 8: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

8

World of sequential computation P problems : the class of problems which can be solved in polynomial time (O(n )). NP problems : the class of problems which can be solved non-determinately in

polynomial time. NP-complete problems : the class of NP problems which can be reduced to each other. P = NP ?

World of parallel computation NC Problems: the class of problems which can be solved in log-polynomial time

(O(lg n) ). P-complete problems : the class of problems which are not NC problems and can be

reduced to each other. Similarly, NC = P ?

NC-class and P-class

k

k

Analysis of parallel algorithms on PRAM model

Page 9: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

9

An Example of PRAM Algorithms

Problem : Find the sum of n integers (x1, x2, ... , xn) - Assume that n integers are put in array A[1..n] on the shared memory.

- To simplify the problem, let n = 2k (k is an integer). main ()

{

for (h=1; h log n; h++) ≦ {

if (index of processor i n/2≦ h) processor i do

{

a = A[2i-1]; /* Reading from the shared memory*/

b = A[2i]; /* Reading from the shared memory*/

c = a + b;

A[i] = c; /* Writing to the shared memory   */

}

}

if (the number of processor == 1) printf("%d¥n", c);

}

Page 10: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

10

Processor Pi reads A[2i-1], A[2i] from the shared memory, then writes their summation to A[i] of the shared memory.

Pi

A[i] A[2i-1]

A[2i]A[1]

P1

A[2]

P2

A[3]A[4] A[n]

An Example of PRAM Algorithms

Page 11: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

11

Find the summation of 8 integers (x1, x2, ... , x8).

P1 P1 P1 P1 P1 P1 P1x1

x2 x3 x4 x5 x6 x7 x8

Output

Input

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7

Sequential algorithm

P1

P1

P2 P3 P4

P1 P2

x1x2 x3x4 x5x6 x7x8

Output

Input

Step 1

Step 2

Step 3Parallel algorithm

An Example of PRAM Algorithms

Page 12: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

12

Analysis of the algorithm

• Computing time : for loop is repeated log n times, each loop can be

executed in O(1) time →   O(log n) time

• Number of processors : not larger than n/2 →   n/2 processors

• Cost:   O(n log n)

It is not cost optimal since the optimal sequential algorithmrun in Θ(n) time.

It is not cost optimal since the optimal sequential algorithmrun in Θ(n) time.

An Example of PRAM Algorithms

Page 13: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

13

  EREW (Exclusive read exclusive write) PRAM

Both concurrent reading and concurrent writing are prohibited.

  CREW (Concurrent Read Exclusive write) PRAM

Concurrent reading is allowed, but concurrent writing is prohibited.

Classification of PRAM by the access restriction

P1 P2 P1 P2

P1 P2 P1 P2P1 P2 P1 P2

Page 14: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

14

  CRCW (Concurrent Read Concurrent write) PRAM Both concurrent reading and concurrent writing are allowed.

It is classified furthermore:

- common CRCW PRAM

Concurrent writing happens is only if the writing data are the same.

- arbitrary CRCW PRAM

An arbitrary data is written.

- priority CRCW PRAM

   The processor with the smallest number writes the data.

P1 P2 P1 P2

Classification of PRAM by the access restriction

Page 15: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

15

Algorithms on different PRAM models

Algorithms for calculating and of n bits (Input is put in array A[1..n])   Algorithm on EREW PRAM modelmain (){

for (h=1; h log n; h++) {≦ if (index of processor i n/2h) processor i do {≦ a = A[2i-1];

b = A[2i];

if ((a==1) and (b==1)) a[i] = 1;

}}}  

  Algorithm on common CRCW PRAM modelmain (){

if (A[index of processor i] == 1) processor i do

A[1] = 1;

}

O(log n) timen/2 processors

O(log n) timen/2 processors

O(1) timen processors

O(1) timen processors

Abilities of PRAM models: EREW < CREW < CRCWAbilities of PRAM models: EREW < CREW < CRCW

Page 16: 1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

16

Exercise1. Suppose nxn matrix A and matrix B are saved in two dimension arrays. Design a PRAM algorithm for A+B using n and nxn processors, respectively. Answer the following questions:

(1) What PRAM models that you use in your algorithms?

(2) What are the runings time?

(3) Are you algorithms cost optimal?

(4) Are your algorithms time optimal?

2. Design a PRAM algorithm for A+B using k (k <= nxn processors). Answer the same questions.