CA- UNIT8

17
UNIT-8 FORMS OF PARALLELISM 8.1 SIMPLE PARALLEL COMPUTATION: Example 1: Numerical Integration over two variables Consider a simple example to explain role of parallel processing in algorithms The example indicates a double-integration of a function of two variables over a rectangular region of the X-Y plane. The double integration is evaluated numerically using a simple parallel algorithm EXAMPLE: NUMERICAL INTEGRATION OVER TWO VARIABLES: A continuous function f(x,y) of two variables X and Y defines a volume in the three-dimensional space created by three axes x,y,z and z=f(X,Y). This volume is determined by the integral: Ymin Ymax Xmin Xmax f ( X,Y ) dx,dy Where the appropriate limits on X and Y have been taken as X min , X max , Y min and, Y max respectively. When such an integral is to be evaluated on a computer, the axes X and Y can be divided into intervals of length deltaX and deltaY respectively, and the integral is replaced by the following summation: f(X,Y)ΔX ΔY ∑∑ The function f(X,Y) must be evaluated at an appropriate point, for example midpoint, within each area element of size (ΔX)( ΔY)

description

forms of parallelism

Transcript of CA- UNIT8

UNIT-8FORMS OF PARALLELISM8.1 SIMPLE PARALLEL COMPUTATION:Example 1: Numerical Integration over two variables Consider a simple example to explain role of parallel processing in algorithms The example indicates a double-integration of a function of two variables over a rectangular region of the X-Y plane. The double integration is evaluated numerically using a simple parallel algorithm EXAMPLE: NUMERICAL INTEGRATION OVER TWO VARIABLES: A continuous function f(x,y) of two variables X and Y defines a volume in the three-dimensional space created by three axes x,y,z and z=f(X,Y). This volume is determined by the integral:

Where the appropriate limits on X and Y have been taken as Xmin, X max, Ymin and, Ymax respectively. When such an integral is to be evaluated on a computer, the axes X and Y can be divided into intervals of length deltaX and deltaY respectively, and the integral is replaced by the following summation:

f(X,Y)X Y

The function f(X,Y) must be evaluated at an appropriate point, for example midpoint, within each area element of size (X)( Y)

Figure: Double Integration of f(x, y) over rectangular region of X-Y plane The number of intervals along X and Y axes is, respectively and One possible parallelized version to solve Eqn(1.1) is shown below1. For each of the Nx X Ny area elements, in parallel, calculate the value of the function f(X,Y) at the mid-point of the area element.2. For each of the Ny rows, in parallel, calculate the summation of f(X,Y) at Nx points along the row; denote this summation as the respective row total; this is the inner summation3. Calculate the sum of Ny row totals found in step2; this is outer summation4. Multiply the sum of step 3 by (X)( Y) Nx, Ny processors are working in parallel in step1. Barrier Synchronization: step2 should not start until all processors have completed step1, and similarly step3 should not start until all the processors involved have completed step2 this type of synchronization between processors- or processes- is known as barrier synchronization. Let us assume that we have a square grid over which the double integration is to be performed i.e Nx=Ny=N The steps for parallel algorithm concluded as follows: Function evaluation: For the evaluation of f(X,Y) at each grid element, we use processors,and the time taken is independent of N Row Totals: with processors used for each row. The N row totals can be calculated in parallel in time steps Final sum: the final sum is calculated using processors in time steps. Thus, the overall, with processors, Thus we see that,the overall with processors, the computation of double integration is performed in time O( ).

Example 2: Addition of N numbers using parallel processors

N numbers on a single processor takes N-1 addition steps. On multiple processors operating in parallel, the same addition of N numbers can be done in more efficient manner. Let us consider the addition of N=8 numbers on 4 processors. Assume that the numbers a0,a1,a2,a7 are distributed on eight processors P0,P1,.P7. Step 1: Do it parallel : a0+a4-->a0, a1+a5-->a1, a2+a6-->a2,a3+a7-->a3 Note that, a0+a4 a0, which meant is that the operand a4 is made available from processor p4 to processor p0,using some mechanism of inter processor communication ,operand a0 is already present in processor p0,and therefore the result of addition is also then available in processor p0. Step 2: Do in parallel : a0+ a2-->a0, a1+a3-->a1 Step3: a0+a1--> a0

Figure: Inter Processor communication in the above algorithm We see that four additions take place in parallel in step1, two additions in step2, and a single addition in step3. Barrier synchronization is needed between steps .sum of eight numbers is available in a0 after three time steps, and the degree of parallelism is 4, since that is maximum number of parallel operations we carried out, which was in step1 Let us assume that, in general , N= for some integer k, i.e N is a power of 2 We can easily verify that:1. In the above example, at the end of three time steps variable a0 in processor p0 does indeed have the sum of eight operands originally given to us2. In generally, for N= values to be added, the number of time steps required will be k= The following figure represents another depiction of inter-processor communication occurs in the pattern of a binary tree.

Figure: Another Depiction of algorithm for adding N numbers using parallel processors. when an associative operation is carried out on a multiprocessor system, it is known as reduce or reduction operation.8.3 PARALLEL ALGORITHMS: The complexity of any sequential algorithm is defined in terms of asymptotic running time of the algorithm on a problem instance of size n. The complexity is shown in bigOh or order notation.Ex: O(t(n)) means that n>n0, the running time of an algorithm grows as kt(n) for some constants n0,k. For a problem instance of size n, assume that an algorithm uses p(n) processors in parallel and has running time in O(t(n)). Then the work performed by the algorithm on a problem instance of size n is defined as w(n)=O(p(n)t(n)). The Work performed by the parallel algorithm can also be referred to as the cost of the algorithm. Consider two different parallel algorithms, say I and II, for solving a given problem. In solving a problem instance of size n, let these two algorithms perform work Wi(n)=O(Pi(n)ti(n)), and Wii(n)=O(Pii(n)tii(n)), respectively. We say that algorithm I is work-efficient with respect to algorithm II if Wi(n) is in O(Wii(n)), i.e Wi(n) is of order of Wii(n).

A deterministic sequential algorithm is considered efficient, if its running time T(n) is a polynomial in n; for example bubble sort has running time in o() A parallel algorithm is said to be efficient if, for solving a problem of size n, it satisfies the following two conditions:1. The running time of processors p(n) used is in o( for some constant a, i.e the number of processors required is polynomial in n, and2. The running time of the algorithm t(n) is in o(), for some constant b,i.e the running time of the algorithm is polylogarithmic in n. An optimal parallel algorithm is defined as one which is work-efficient with respect to the best possible sequential algorithm for solving the problem.

BRENTS THEOREM: For a given problem, suppose that there exists a parallel algorithm which solves a problem instance of size n using p(n) processors in time o(t(n)), and if q(n)