�� SOLUTIONS ��
King’s College LondonThis paper is part of an examination of the College counting towards the award
of a degree. Examinations are governed by the College Regulations under the
authority of the Academic Board.
Degree Programmes BSc, MSci
Module Code 6CCS3PAL: IYA
Module Title Parallel Algorithms
Examination Period Example exam 2017
Time Allowed Two hours
Rubric Answer Three of the Five questions.
All questions carry equal marks. If more than three questions
are answered, clearly indicate which answers you would like
to be marked. Write this clearly in the dedicated section on
the front page of the answer booklet.
Answer each question on a new page of your
answer book and write its number in the space
provided.
Calculators Calculators are not permitted
Notes Books, notes or other written material may not be brought
into this examination
Answer each question on a new page of your answer book
and write its number in the space provided
DO NOT REMOVE THIS PAPER FROM THE EXAMINATION
ROOM
2017 King's College London
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
1. a. Compare and contrast the notions of parallel and distributed algorithms.
To what extent do they di�er, to what extent are they the same?
[10 marks]
Answer
[marks 5+5]
Open ended. Chance to see what students think, understand.
Parallel (PRAM) totally authoritarian, every body does same thing,
refers to shared memory under central control.
Parallel (MESH) still under central control, but local memory and con-
nections.
Distributed anarchic, everybody does own thing, no central knowledge
of network structure etc.
b. De�ne speedup, cost and e�ciency measures for parallel algorithms. To
what extent are these measures meaningful?
[8 marks]
Answer
De�ne speedup, cost and e�ciency measures for a parallel algorithm
using p processors on a problem instance of size n. T ∗(n) worst case
run time of optimal sequential algorithm Speedup T ∗(n)/Tp(n). Cost
pTp(n). E�ciency T ∗(n)/(pTp(n))
To what extent are these measures meaningful? Problem is, often dont
know T ∗(n).
Page 2 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
c. Give a description of the following topics. Your answer should include
an outline of any appropriate algorithms.
i. Parallel and distributed approaches to broadcasting and connectivity
testing in networks.[18 marks]
ii. Parallel computation of a minimum spanning tree and the connected
components of an undirected graph.[14 marks]
Answer
The topics are standard examples which were covered in detail in the
course Part i) [marks 9+9]
Part ii) [marks 7+7]
Page 3 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
2. a. Explain the Parallel Random Access Machine (PRAM) model with the
help of a diagram. What are the di�erent assumptions that can be made
about reading and writing in this model?
[6 marks]
Answer
Diagram is sketch of PRAM network (not given here)
PRAM: Synchronous shared-memory model,
PRAM algorithms are of type single instruction multiple data (SIMD)
type. 4 Types:
EREW - Exclusive Read Exclusive Write
CREW - Concurrent Read Exclusive Write
CRCW - Concurrent Read Concurrent Write
ERCW - also possible
Page 4 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
b. The following sub-questions assume the EREW-PRAM model.
i. State in pseudocode a parallel algorithm to initialise an array A[1...n]
with a value Z on a EREW-PRAM by broadcasting.[6 marks]
ii. Use this algorithm to initialise an array of length n = 8 with the item
X, showing the steps in the execution of the algorithm.[6 marks]
iii. Explain how to use this algorithm to test if an item X is contained
in an array B[1...n] and to return the lowest index if any, of the
location of X in the array. Give an analysis of the running time of
the algorithm.[6 marks]
iv. Assuming that the array A[1...8] is initialized with the item X as
in part (??) above, show the steps in returning the lowest index
containing this item in the array B = [H,A,X,E, L,A,X,X].[6 marks]
Answer
(1) By broadcasting on an EREW PRAM we mean to copy an item X
that is stored in an speci�ed location of the global memory of an EREW
PRAM to all local memories.
The running time is O(log n) due to �rst and third stages.
Let p = 2k. Final array index is p-1.
BroadcastPRAM(A[0..p-1], X) {
A[0] = X;
for (int i = 1; i <= k; i++) {
for 2**(i-1) <= j <= 2**i - 1 pardo {
A[j] = A[j - 2**(i-1)];
}
}
}
Page 5 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
(2) Use broadcasting, start withX in Temp[1], double the index range at
each step. In this way X is broadcast to all entries in Temp. A diagram
will do.
(3) Test membership
EREW--IS-IN(X,L) begin
For i=1,...n do in parallel
begin
If L[i]=Temp[i] then
Temp[i]:= i
else
Temp[i]:= INFINITY
end
end
Find lowest index? Use binary-fan-in. Processor i writes min(L[2i], L[2i+
1]) to L[i]. A diagram will do.
(4) Let M denote in�nity
[ H, A, X, E, L, A, X, X]
[ M, M, 3, M, M, M, 7, 8] Initialize membership
[M, 3, M, 7] Pairwise min
[3 ,7] Pairwise min
[3] Pairwise min
Ans: That min array entry containing X is 3
Page 6 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
c. Give an algorithm for the distributed construction of a breadth �rst
search spanning tree T in a connected graph G.
[10 marks]
Illustrate the construction of the tree T starting from the root vertex 1
in the graph G with vertex set {1, 2, 3, 4, 5, 6, 7, 8} and edges
{1, 2}, {1, 6}, {2, 3}, {2, 5}, {5, 8}, {6, 5}, {3, 4}, {3, 5}, {4, 8}, {8, 7}, {7, 6}.
[10 marks]
Answer
The algorithm used in the lectures:
Initially all vertices have status 'not in tree'
In round 1 the root changes its status to 'in tree' and send message 'join
tree' to all its neighbours
If an unmarked node receives the message 'join tree' it changes its status
to 'in tree' and send the message 'your child' to the lowest labeled of
the sending nodes, and 'not your child' to the others.
In the next round it send 'join tree' to unmarked neighbours
If a node has no children (ie didnt receive 'your child' from any vertex
in the current round) then it sends 'terminated' to its parent.
When a node receives 'terminated' from all its children, it send termi-
nated to its parent
The process �nishes when the root receives terminated from all its neigh-
bours
The diagram is shown in �le IYA-2018-�gures.pdf
Page 7 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
3. a. i. Explain the list ranking problem. Give a CREW-PRAM parallel al-
gorithm for the list ranking problem. Establish the runtime of your
algorithm on an n element list L, using n processors.[12 marks]
ii. Demonstrate how your algorithm works on a 4 element list.[8 marks]
Page 8 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
Answer
[marks for part i) 6+6]
The list-ranking problem is to determine the distance of each node
i from the end of the list.
Algorithm: LIST-RANK
Uses Pointer jumping technique:
Input: A linked list L of size n, or an array S simulating the order of
L.
Output: Distance dist(i), for 1 ≤ i ≤ n of node i from the end.
LIST-RANK(L) BEGIN
%n list length
%next(k) list pointer to next item
%initialize distance
For all k \in L in parallel do
begin
P(k):= next(k)
If P(k) =/= k then dist(k):=1
else dist(k):=0
end
Repeat log_2 n times
For all k \in L in parallel do
begin
If P(k) =/= P(P(k)) then
begin
dist(k):=dist(k)+dist(P(k))
P(k) := P(P(k))
end
Page 9 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
end
For all k \in L in parallel do
Rank(k):= dist(k)
END
Demonstration of algorithm: log24 = 2 so 2 alg. has rounds.
% initialize
k 1 2 3 4
P(k):=next(k) 2 3 4 4
dist 1 1 1 0
PP(k) 3 4 4 4
% round 1
P(k) =P(P(k))? N N Y Y
dist 2 2 1 0
P(k) update 3 4 4 4
PP(k) 4 4 4 4
%round 2
P(k) =P(P(k))? N Y Y Y
dist 3 2 1 0
% final rank
Page 10 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
b. i. Explain what is meant by an associative binary operator, giving ex-
amples. Is the operator max(a, b), which calculates the maximum of
two numbers a, b, an associative binary operator? If so explain why,
giving an example.[5 marks]
ii. An array A contains data entries [x1, x2, . . . , xn]. De�ne the term
pre�x sums of A with respect to an associative binary operator ◦.Let A = [5, 3, 8, 4, 0]. State the pre�x sums of A using the operator
max.[5 marks]
Answer
1. associative binary operator ◦, is binary (C := A ◦ B and satis�es
(A ◦B) ◦ C = A ◦ (B ◦ C)
Max is associative, the max of 3 numbers is the max whatever its
order of evaluation.
2. Let ⊕ be a binary operator.
The pre�x sums of a sequence {x1, x2, · · · , xn} are the n partial
sums de�ned by
si = x1 ⊕ x2 ⊕ · · · ⊕ xi, 1 ≤ i ≤ n.
A=[ 5, 3, 8, 4, 0] max 5 5 8 8 8
Page 11 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
c. i. Suppose the binary operator is addition of integers (+). Describe a
CREW-PRAM parallel algorithm to compute the additive pre�x sums
of n array values held at locations A(n), ..., A(2n − 1) of an array
A[1 : 2n− 1]. Justify the running time complexity of your answer.[10 marks]
ii. Use a diagram to demonstrate the execution of this algorithm when
computing the pre�x sums of the following array: [6, 3, 2, 4, 5, 4, 3, 2].
Give all steps and justify intermediate results of the computation.[10 marks]
Page 12 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
Answer
Algorithm Prefix Computation
Input: an array A[1:2n-1] with data at A(n),...,A(2n-1)
Output: an array B[1:2n-1] with prefix sums at B(n),...,B(2n-1)
Assumption n=2^m (m=log_2 n)
For k= (m-1) step (-1) to 0 do begin
For all j, 2^k \le j \le 2^{k+1}-1 in parallel do
A(j):= A(2j)+A(2j+1)
% compress on even indices
end
B(1):=A(1)
% insert total value of array at root of binary tree
% now come back down the tree
For k= 1 step 1 to (m-1) do begin
For all j, 2^k \le j \le 2^{k+1}-1 in parallel do
If j is odd then
B(j):= B((j-1)/2)
If j is even then
B(j):= B(j/2))-A(j+1)
end
end % of algorithm
Runs in O(log n) time and performs O(n) operations. The algorithm is
upward pass from leaves to root of balanced binary tree of height log2 n
followed by downward pass to leaves. Therefore, we have a O(log n)
parallel running time.
The diagram is shown in �le IYA-2018-�gures.pdf
Page 13 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
4. a. Explain the Distributed Memory Interconnection Network model with
the help of a diagram. Outline typical network structures that might
arise.
[10 marks]
Answer
Standard model (see below)+ Scope for creativity here with the net-
works.
Page 14 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
b. Describe three basic goodness measures for an interconnection network,
and explain why they are considered important.
[10 marks]
Illustrate your answer for the following networks:
i. Complete graph K(p) of p processors.[5 marks]
ii. Two-dimensional mesh M(p, p) of p× p processors.[5 marks]
Answer
Diameter, maximum degree, bisection width (smallest edge cut between
two sets of the same size).
Importance:
Diameter: Determines shortest route between processors (information
exchange)
Maximum degree: Increase max degree, more processors can communi-
cate directly
Bisection width: Measure of comms. bottleneck (unfortunately not so
easy to calculate)
Complete graph K(p), Diameter 1, max degree p− 1, Bisection width:
∼ p/2 (excellent)
Two dimensional mesh M(p, p). Diameter Θ(p), max deg. 4, Bisection
width Θ(p) (not too bad, but not brilliant, especially diameter)
Page 15 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
c. Suppose the values of an array L of length n2 are distributed over
a square two-dimensional mesh interconnection network consisting of
p = n2 processors, in row major order. It is required to determine the
minimum processor index i (if any) for which L(i) is a given item X.
Describe a parallel algorithm to search for X on the mesh, illustrating
your answer using the 3× 3 mesh given below, with X = f . You may
assume all processors already know the value of X.
[c]---[g]---[f]
| | |
[a]---[t]---[f]
| | |
[c]---[f]---[c]
[20 marks]
Answer
We assume L is held in row major order on the mesh. Thus L(k) is held
at location (i, j) where k = ip + j for i, j = 0, ..., p− 1. Initially each
processor initializes variable Index as follows:
Index:= INFINITY
If L(i,j)=X then
Index:= i*p+j
The processors then execute MIN-2D-MESH(Index, p) to pass the min-
imum index location to processor (0, 0).
Notation on variable passing.
P (i, j).T emp means value of Temp at processor P (i, j)
MIN-2D-MESH(Y, p)
%Input Y=[Y(1),...,Y(n)], n=p*p
% List of items
%Output Min(Y)
Page 16 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
% compute column minimums
% compare rows (Row, Row+1) pairwise starting from bottom
% at end first row contains column minimum
For Row = p-1 down to 0 do
For i=Row and 0 \le j \le p-1 in parallel do
begin
P(i,j).Temp := P(i+1,j).X
X:= Min(X,Temp) % at processor (i,j)
end
% compute minimum of first row sequentially
% working backwards from last column pairwise
For Column = p-1 down to 0 do
begin
P(0,j).Temp := P(0,j+1).X
X:= Min(X,Temp) % at processor (0,j)
end
Return P(0,0).X
Let M= INFINITY
[c]---[g]---[f]
| | |
[a]---[t]---[f]
| | |
[c]---[f]---[c]
Page 17 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
[M]---[M]---[3]
| | |
[M]---[M]---[6]
| | |
[M]---[8]---[M]
[M]---[M]---[3]
| | |
[M]---[8]---[6]
| | |
[M]---[8]---[M]
[M]---[8]---[3]
| | |
[M]---[8]---[6]
| | |
[M]---[8]---[M]
First row only
[M]---[8]---[3]
[M]---[3]---[3]
[3]---[3]---[3]
Ans Min index = 3
Page 18 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
5. a. i. Explain the divide and conquer approach to the design of algorithms.
[4 marks]
ii. State an algorithm for sequential MergeSort of an array A[1...n]
which acts by divide and conquer.[5 marks]
iii. Use the array A = [6, 8, 7, 1, 4, 3] to illustrate the operation of this
algorithm.[3 marks]
iv. Explain how to produce a parallel version of MergeSort, to sort an
array A[1...n] in the multi-threading environment. What is the span
of this algorithm?[8 marks]
Page 19 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
Answer
The purpose of this question is to examine the following
topic(s): Parallel sorting algorithmsThis question addresses parallel divide an conquer strategies,
and parallel sorting
(i)Divide and conquer strategy
1. Divide input into partitions of almost equal size
2. Recursively solve the subproblems de�ned by the partition
3. Combine the solutions to the subproblems into a single answer and
pass it back as the answer to the recursion call
It must be possible to perform steps 1,3 e�ciently for the D and C
approach to work well
(ii) Sequential MergeSort
Seql-MergeSort(L, p, r)
begin
If p < r then
q = b(p + r)/2cSeql-MergeSort(L, p, q)
Seql-MergeSort(L, q + 1, r)
Seql-Merge(L, p, q, r)
end
(iii)
produce recursion tree
[6 8 7 1 4 3]
[6 8 7] [1 4 3]
[6 8] [7] [1 4] [3]
[6 ] [8] [7] [1] [4] [3]
start merging
Page 20 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
[6 8] [ 7] [1 4] [3]
[6 7 8] [1 3 4]
[1 3 4 6 7 8]
Sorted
(iv)
Par-MergeSort(L, p, r)
begin
If p < r then
q = b(p + r)/2cSpawn Par-MergeSort(L, p, q)
Par-MergeSort(L, q + 1, r)
Sync
Seql-Merge(L, p, q, r)
end
No appreciable speed up as cost is in Sequential Merge. Span is Θ(n)
Page 21 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
b. i. State a sequential binary search algorithmBinary-Search(x,A, p, r)
to return the index of the �rst occurrence of x in the sorted array
A[p..r]. Assume that, if x < A[p], the index p is returned; similarly,if
x > A[r], the index r is returned.[ 10 marks]
ii. Illustrate the execution of algorithmBinary-Search(x,A, 1, 13) when
used to locate the �rst occurrence of x = 6 in the sorted array
A = [1, 3, 5, 6, 6, 7, 9, 14, 21, 30, 74, 75, 87]
[ 12 marks]
iii. Explain how to use sequential binary search to improve the running
time of parallel Merge-Sort, illustrating your explanation with a suit-
able diagram. What is the span of the improved algorithm in the
multi-threading model?[ 8 marks]
Page 22 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
Answer
(i) The following Binary Search returns the �rst occurrence of x ∈ T
Binary-Search (x, T, p, r)
if p = r return p (end of search)
else (carry on searching)
begin
Low L = p
High H = r
while L < H
Middle value q = b(L + H)/2cif x ≤ T [q]
(set High to Mid to search bottom half of T )
H = q
else L = q + 1 (search top half)
return High H
end
(ii) Binary-Search(6, A, 1, 13)
A = [1, 3, 5, 6, 6, 7, 9, 14, 21, 30, 74, 75, 87]
marks 2.5 per step to give 12 (last step 2 marks)
BinSrc(6,A,1,13)
Low=1
High=13
q=(1+13)/2=7
A[7]=9
x=6
yes 6< 9
set High to Mid
High = 7
Page 23 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
BinSrc(6,A,1,7)
Low = 1
High = 7
q=(1+7)/2=4
A[4]=6
6 <= A[4]
set High to Mid
High = 4
BinSrc(6,A,1,4)
Low = 1
High = 4
q=(1+4)/2=2
A[2]=3
6>3 Search top half
Low = q+1=3
BinSrc(6,A,3,4)
Low=3
High = 4
q=(3+4)/2=3
A[3]=5
6>5 Search top half
Low=q+1=4
BinSrc(6,A,4,4)
4=4 end of search
no value of x is lower
return index 4
(iii) To produce a Parallel Merge of two sorted arrays A,B
Page 24 SEE NEXT PAGE
�� SOLUTIONS ��
Example exam 2017 6CCS3PAL: IYA
Let q be median index of A
Let x = A(q) so that A = A(1...q − 1), x, A(q + 1...n)
Split B into B(< x) and B(≥ x) using Binary-Search(x,B)
Parallel-Merge(A(1...q−1), B(< x), ) x, Parallel-Merge(A(q+1...n), B(≥x))
The left half has entries at most x,
The right half has entries at least x.
This process can be done in parallel, thus removing the bottleneck in
the merge phase of sorting.
The span is poly-log n (Θ(log3 n) proved in Cormen text book Chapter
27)
Page 25 FINAL PAGE
Top Related