Instructor Neelima Gupta [email protected]. Table of Contents Parallel Algorithms.
-
Upload
mariela-hannibal -
Category
Documents
-
view
225 -
download
2
Transcript of Instructor Neelima Gupta [email protected]. Table of Contents Parallel Algorithms.
![Page 2: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/2.jpg)
Table of Contents
Parallel Algorithms
![Page 3: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/3.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
Instructor: Ms Neelima Gupta
![Page 4: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/4.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
Solving a problem on multiple processors. S(n) is sequential time to solve a problem. T(n,p) is the parallel time to solve a
problem on p processors. W(n) is the work done by a parallel
algorithm. W(n)=T(n,p) p A parallel algorithm is optimal if the work
done is best of known sequential algorithm. i.e. if W(n)=S(n) Speed up is how much time is gained by
using more processors. speed up = S(n)/T(n,p)
![Page 5: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/5.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
Take a problem of computing sum of numbers.Sequential time = Θ(n)We have 2 processors p1 and p2 and the numbers are2,3,4,5,1,11,13,10,7,8Initially all the numbers are with p1 and it sends half of them to p2. Both p1 and p2 compute sums and send the sums s1 and s2 to each other. So both have the final sum. p1 p2 2,3,4,5,1 11,13,10,7,8 (s1+s2) (s1+s2) Communication time= Θ(1)Computation time= Θ(n/2)T(n,2)= Θ(n/2)W(n)= n/2 2 =nHence this algorithm is optimal.Speed up = n/ n = 2 2
![Page 6: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/6.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
PARALLEL MODELSDistributed Computing
Several independent machines are there.They communicate with
each oher by passing messages.The final result comes from all
independent machines.
M1 M2
M3
M5
M4
![Page 7: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/7.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
SHARED MEMORY MODEL All the processors are reading and writing to the same memory.
There is no communication between them. Can not write at same time but can read at
same time.
SharedmemoryShared
memory
p1
p2
p3
pn
![Page 8: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/8.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
Models for concurrency in shared memory modelEREW(Exclusive read exclusive write)CREW(Concurrent read exclusive write)CRCW(Concurrent read Concurrent Write)The weakest is EREW.CREW is Better than EREW but weaker thanCRCW.If we go from CRCW to CREW there is a
slowdownof factor of log(n).
![Page 9: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/9.jpg)
Made By : Deepika Kamboj ( Roll No.7, MSc '11 )
![Page 10: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/10.jpg)
Searching for a key Key =
x1 x2 xnx3
p1p
pnp
p3p
p2p
x1= xn=x3==
x2==
…….…
0
COMPARISON
OUTPUT
Thanks to 'PREETI'
xi …….…
…….…
….….…
…….…
…….…
pip
xi==
![Page 11: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/11.jpg)
CRCW Key =
x1 x2 xnx3
p1p
pnp
p3p
p2p
x1= xn=x3==
x2==
…….…
0
COMPARISON
OUTPUT
Thanks to 'PREETI'
xi …….…
…….…
….….…
…….…
…….…
pip
xi==
Match Match found
![Page 12: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/12.jpg)
CRCW Key =
x1 x2 xnx3
p1p
pnp
p3p
p2p
x1= xn=x3==
x2==
…….…
1
COMPARISON
OUTPUT
Thanks to 'PREETI'
xi …….…
…….…
….….…
…….…
…….…
pip
xi==
Match Match found
![Page 13: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/13.jpg)
VERSION 1 OF SEARCHINGTo find the existence of the given KEY.MODEL used
CRCW Common Priority Arbitrary
Thanks to 'PREETI'
![Page 14: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/14.jpg)
example for version1
Key = 7 12 7 3022
p1p
p6p
p3p
p2p
12≠7
0
COMPARISON
OUTPUT
15 7
p4p
p5p
30≠7
7=715≠7
22≠7
7=7
Thanks to 'PREETI'
![Page 15: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/15.jpg)
example for version1
Key = 7 12 7 3022
p1p
p6p
p3p
p2p
12≠7
0
COMPARISON
OUTPUT
15 7
p4p
p5p
30≠7
7=715≠
722≠
77=7
Thanks to 'PREETI'
![Page 16: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/16.jpg)
example for version1
Key = 7 12 7 3022
p1p
p6p
p3p
p2p
12≠7
1
COMPARISON
OUTPUT
15 7
p4p
p5p
30≠7
7=715≠
722≠
77=7
Thanks to 'PREETI'
![Page 17: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/17.jpg)
VERSION 2 OF SEARCHINGTo find the processor id.MODEL used
CRCW Common Priority Arbitrary
Thanks to 'PREETI'
![Page 18: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/18.jpg)
example for version2
Key = 7 12 7 3022
p1p
p6p
p3p
p2p
12≠7
0
COMPARISON
OUTPUT
15 7
p4p
p5p
30≠7
7=715≠7
22≠7
7=7
Thanks to 'PREETI'
![Page 19: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/19.jpg)
example for version2
Key = 7 12 7 3022
p1p
p6p
p3p
p2p
12≠7
0
COMPARISON
OUTPUT
15 7
p4p
p5p
30≠7
7=715≠
722≠
77=7
Thanks to 'PREETI'
![Page 20: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/20.jpg)
example for version2
Key = 7 12 7 3022
p1p
p6p
p3p
p2p
12≠7
p5
COMPARISON
OUTPUT
15 7
p4p
p5p
30≠7
7=715≠
722≠
77=7
P2 or p5
gets
written
Thanks to 'PREETI'
![Page 21: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/21.jpg)
Or
Key = 7 12 7 3022
p1p
p6p
p3p
p2p
12≠7
p2
COMPARISON
OUTPUT
15 7
p4p
p5p
30≠7
7=715≠
722≠
77=7
P2 or p5
gets
written
Thanks to 'PREETI'
![Page 22: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/22.jpg)
VERSION 3 OF SEARCHINGTo find the LEFT MOST OCCURRENCE of
the given KEY.MODEL used
CRCW Common Arbitrary Priority
×
Thanks to 'PREETI'
![Page 23: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/23.jpg)
example for version3
Key = 7 12 7 3022
p1p
p6p
p3p
p2p
12≠7
0
COMPARISON
OUTPUT
15 7
p4p
p5p
30≠7
7=715≠7
22≠7
7=7
Thanks to 'PREETI'
![Page 24: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/24.jpg)
example for version3
Key = 7 12 7 3022
p1p
p6p
p3p
p2p
12≠7
0
COMPARISON
OUTPUT
15 7
p4p
p5p
30≠7
7=715≠
722≠
77=7
Thanks to 'PREETI'
![Page 25: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/25.jpg)
example for version3
Key = 7 12 7 3022
p1p
p6p
p3p
p2p
12≠7
p2
COMPARISON
OUTPUT
15 7
p4p
p5p
30≠7
7=715≠
722≠
77=7
P2 has
highest
priority.
Thanks to 'PREETI'
![Page 26: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/26.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
SUM PROBLEMFind sum of n numbers and there are n processors. n
processors
n/2 processors
n/4
processors
1 processor
a1 a2 a3 a4 an
![Page 27: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/27.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
Height of this tree is log n.Each step is taking constant time.Hence this algo takes O(log n) time.W(n)= n log n= nlogn.Speed up=n/log n.This algorithm is not optimal as half of the
processors areidle in first step and number of idle
processors isincreasing in further steps.What if we use n/log n processors.
![Page 28: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/28.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
As the number of processors is n/log n.Each processor will get log n values.
s1 s2 sm
Take m=n/log nEach processor has n/log n values so sm sums will be generated.
![Page 29: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/29.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
The height is log m.So it will take log m time <= log nSo T(n,p) <= 2logn = O(log n)W(n)= n=O(S(n))As sequential time is O(n).Hence this algorithm is optimal.
![Page 30: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/30.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
SORTING Sort n numbers in parallel with n processors. Initially each procesor has an element.
n/2,2 merge
n/4,4 merge
1,n merge
a1 a2 a3 a4 an
![Page 31: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/31.jpg)
Thanks to: Tejinder Kaur (35, MCS '09)
The last step will take n units of time n + n/2 + n/4 + - - - - - - + 2 <= 2nSo it takes O(n) time.W(n)= n2
![Page 32: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/32.jpg)
Thanks to: Surbhi Tripathi (27, MCS '09)
Instructor: Ms Neelima Gupta
![Page 33: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/33.jpg)
Thanks to: Surbhi Tripathi (27, MCS '09)
Definition: Prefix SumsGiven: Set of n values A = {a0,a1…….,an-1}
We want to find the prefix sums S0, S1,………..Sn-1.
Where, S0=a0 S1=a1+a0 | | Sn-1=an-1+…………+a1+a0
![Page 34: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/34.jpg)
Thanks to: Surbhi Tripathi (27, MCS '09)
STEP - IIa0 a1 a2 a3 a4 a5 a6
a7
P1:s1 P2:a2oa3 P3:a4oa5 P4:a6oa7
p2: s3(s1oa2oa3)
p3:a4oa5oa6
p4:a4oa5oa6oa7
p1: s2(s1oa2)
![Page 35: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/35.jpg)
Thanks to: Surbhi Tripathi (27, MCS '09)
STEP - IIIa0 a1 a2 a3 a4 a5 a6 a7
P1:s1 P2:a2oa3 P3:a4oa5 P4:a6oa7
p2: s3(s1oa2oa3)
p3:a4oa4oa6
p4:a4oa5oa6oa7
p1: s2(s1oa2)
p1=s4 (s3oa4)
p4: s7 (s3oa4oa5oa6oa7)
p3: s6 (s3oa4oa5oa6)
p2: s5 (s3oa4oa5)
![Page 36: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/36.jpg)
Thanks to: Surbhi Tripathi (27, MCS '09)
CREW Model
Computations of prefix sums do not require any concurrent writes.
![Page 37: Instructor Neelima Gupta ngupta@cs.du.ac.in. Table of Contents Parallel Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649c765503460f9492aced/html5/thumbnails/37.jpg)
Thanks to: Surbhi Tripathi (27, MCS '09)
TIME COMPLEXITYTo compute prefix sums of n numbersAs,the number of prefix sums computed
doubles at each step.While computing n prefix sums we get a tree of height log n.
Each step takes constant time.
So, computing n prefix sums using n processors in parallel takes log n time