CS 684
Algorithm Analysis Assumptions
Consider ring, mesh, and hypercube.Consider ring, mesh, and hypercube. Each process can either send or receive a Each process can either send or receive a
single message at a time.single message at a time. No special communication hardware.No special communication hardware. When discussing a mesh architecture we When discussing a mesh architecture we
will consider a square toroidal mesh.will consider a square toroidal mesh. Latency is Latency is ttss and Bandwidth is and Bandwidth is ttww
Basic Algorithms
Broadcast AlgorithmsBroadcast Algorithms one to all (scatter)one to all (scatter) all to one (gather)all to one (gather) all to allall to all
ReductionReduction all to oneall to one all to allall to all
Broadcast (ring)
Distribute a message of size Distribute a message of size mm to all nodes. to all nodes.
source
Broadcast (ring)
Distribute a message of size Distribute a message of size mm to all nodes. to all nodes.
Start the message both waysStart the message both ways
source
1
2
2
3
3 4
4T = (ts + twm)(p/2)
Broadcast (mesh)
Broadcast (mesh)Broadcast tosource row using ring algorithm
Broadcast (mesh)Broadcast tosource row using ring algorithm
Broadcast to therest using ringalgorithm fromthe source row
Broadcast (mesh)Broadcast tosource row using ring algorithm
Broadcast to therest using ringalgorithm fromthe source row
T = 2(ts + twm)(p1/2/2)
Broadcast (hypercube)
Broadcast (hypercube)
A message is sent along each dimension of the hypercube.
Parallelism grows as a binary tree.
12
2
3
3
3
3
Broadcast (hypercube)
A message is sent along each dimension of the hypercube.
Parallelism grows as a binary tree.
12
2
3
3
3
3T = (ts + twm)log2 p
Broadcast
Mesh algorithm was based on embedding Mesh algorithm was based on embedding rings in the mesh.rings in the mesh.
Can we do better on the mesh?Can we do better on the mesh? Can we embed a tree in a mesh?Can we embed a tree in a mesh?
Exercise for the reader. (-: hint, hint ;-)Exercise for the reader. (-: hint, hint ;-)
Other Broadcasts
Many algorithms for all-to-one and all-to-Many algorithms for all-to-one and all-to-all communication are simply reversals and all communication are simply reversals and duals of the one-to-all broadcast.duals of the one-to-all broadcast.
ExamplesExamples All-to-oneAll-to-one
Reverse the algorithm and concatenateReverse the algorithm and concatenate All-to-allAll-to-all
Butterfly and concatenate Butterfly and concatenate
Scatter Operation
Often called one-to-all personalized Often called one-to-all personalized communication.communication.
Send a different message to each node.Send a different message to each node.
1,2,3,4,5,6,7,8
1,2,3,41,2
5,6,7,8
3,4
5,6 7,8
1 3
5 72 4
68
Reduction Algorithms
Reduce or combine a set of values on each Reduce or combine a set of values on each processor to a single set.processor to a single set. SummationSummation Max/MinMax/Min
Many reduction algorithms simply use the Many reduction algorithms simply use the all-to-one broadcast algorithm.all-to-one broadcast algorithm. Operation is performed at each node.Operation is performed at each node.
Reduction
If the goal is to have only one processor If the goal is to have only one processor with the answer, use broadcast algorithms.with the answer, use broadcast algorithms.
If all must know, use butterfly.If all must know, use butterfly. Reduces algorithm from 2log p to log pReduces algorithm from 2log p to log p
How'd they do that?
Broadcast and Reduction algorithms are Broadcast and Reduction algorithms are based on Gray code numbering of nodes.based on Gray code numbering of nodes.
Consider a hypercube.Consider a hypercube.
000
100
010
001
110
101
111
011Neighboring nodesdiffer by only one bit location.
0 1
2 3
4 5
6 7
How'd they do that?
Start with most significant bit.Start with most significant bit. Flip the bit and send to that processorFlip the bit and send to that processor Proceed with the next most significant bitProceed with the next most significant bit Continue until all bits have been used.Continue until all bits have been used.
Procedure SingleNodeAccum(d, my_id, m, X, sum) for j = 0 to m-1 sum[j] = X[j]; mask = 0 for i = 0 to d-1
if ((my_id AND mask) == 0) if ((my_id AND 2i) != 0
msg_dest = my_id XOR 2i
send(sum, msg_dest) else
msg_src = my_id XOR 2i
recv(sum, msg_src)for j = 0 to m-1 sum[j] += X[j]
endifendifmask = mask XOR 2i
endforend
All-to-all personalized Comm.
What about when everybody needs to What about when everybody needs to communicate something different to communicate something different to everybody else?everybody else? matrix transpose with row-wise matrix transpose with row-wise
partitioningpartitioning IssuesIssues
Everybody Scatter?Everybody Scatter? Bottlenecks?Bottlenecks?
All-to-All Personalized Communication
All-to-all personalized communication. All-to-all personalized communication.
All-to-All Personalized Communication: Example Consider the problem of transposing a Consider the problem of transposing a
matrix. matrix. Each processor contains one full row of the Each processor contains one full row of the
matrix. matrix. The transpose operation in this case is The transpose operation in this case is
identical to an all-to-all personalized identical to an all-to-all personalized communication operation. communication operation.
All-to-All Personalized Communication: Example
All-to-all personalized communication in transposing a All-to-all personalized communication in transposing a 4 x 44 x 4 matrix matrix using four processes. using four processes.
All-to-All Personalized Communication on a Ring Each node sends all pieces of data as one Each node sends all pieces of data as one
consolidated message of size consolidated message of size m(p – 1)m(p – 1) to one of its to one of its neighbors. neighbors.
Each node extracts the information meant for it Each node extracts the information meant for it from the data received, and forwards the from the data received, and forwards the remaining (remaining (p – 2p – 2) pieces of size ) pieces of size mm each to the next each to the next node. node.
The algorithm terminates in The algorithm terminates in p – 1p – 1 steps. steps. The size of the message reduces by The size of the message reduces by mm at each step. at each step.
All-to-All Personalized Communication on a Ring
All-to-all personalized communication on a six-node ring. The label of each All-to-all personalized communication on a six-node ring. The label of each message is of the form message is of the form {x,y}{x,y}, where , where xx is the label of the node that originally is the label of the node that originally
owned the message, and owned the message, and yy is the label of the node that is the final is the label of the node that is the final destination of the message. The label destination of the message. The label ({x({x11,y,y11}, {x}, {x22,y,y22},…, {x},…, {xnn,y,ynn},}, indicates a indicates a
message that is formed by concatenating message that is formed by concatenating nn individual messages. individual messages.
All-to-All Personalized Communication on a Ring: Cost We have We have p – 1 p – 1 steps in all. steps in all.
In step In step ii, the message size is , the message size is m(p – i)m(p – i). . The total time is given by:The total time is given by:
The The ttww term in this equation can be reduced by a factor term in this equation can be reduced by a factor of 2 by communicating messages in both directions. of 2 by communicating messages in both directions.
All-to-All Personalized Communication on a Mesh Each node first groups its Each node first groups its pp messages according to messages according to
the columns of their destination nodes. the columns of their destination nodes. All-to-all personalized communication is All-to-all personalized communication is
performed independently in each row with performed independently in each row with clustered messages of size clustered messages of size mm√√pp. .
Messages in each node are sorted again, this time Messages in each node are sorted again, this time according to the rows of their destination nodes. according to the rows of their destination nodes.
All-to-all personalized communication is All-to-all personalized communication is performed independently in each column with performed independently in each column with clustered messages of size clustered messages of size mm√√pp. .
All-to-All Personalized Communication on a Mesh
The distribution of messages at the beginning of each phase of all-to-all personalized The distribution of messages at the beginning of each phase of all-to-all personalized communication on a communication on a 3 x 33 x 3 mesh. At the end of the second phase, node mesh. At the end of the second phase, node ii has messages has messages ({0,i},…,{8,i})({0,i},…,{8,i}), where , where 0 0 ≤ i ≤ ≤ i ≤ 88. The groups of nodes communicating together in each . The groups of nodes communicating together in each
phase are enclosed in dotted boundaries. phase are enclosed in dotted boundaries.
All-to-All Personalized Communication on a Mesh: Cost Time for the first phase is identical to that in a Time for the first phase is identical to that in a
ring with ring with √p√p processors, i.e., processors, i.e., (t(tss + t + twwmp/2)(mp/2)(√p√p – – 1)1). .
Time in the second phase is identical to the first Time in the second phase is identical to the first phase. Therefore, total time is twice of this time, phase. Therefore, total time is twice of this time, i.e., i.e.,
It can be shown that the time for rearrangement It can be shown that the time for rearrangement is less much less than this communication time. is less much less than this communication time.
All-to-All Personalized Communication on a Hypercube Generalize the mesh algorithm to Generalize the mesh algorithm to log plog p steps. steps. At any stage in all-to-all personalized At any stage in all-to-all personalized
communication, every node holds communication, every node holds pp packets of packets of size size mm each. each.
While communicating in a particular dimension, While communicating in a particular dimension, every node sends every node sends p/2p/2 of these packets of these packets (consolidated as one message). (consolidated as one message).
A node must rearrange its messages locally before A node must rearrange its messages locally before each of the each of the log plog p communication steps. communication steps.
All-to-All Personalized Communication on a Hypercube
An all-to-all personalized communication algorithm on a three-dimensional hypercube. An all-to-all personalized communication algorithm on a three-dimensional hypercube.
All-to-All Personalized Communication on a Hypercube We have We have log plog p iterations and iterations and mp/2mp/2 words words
are communicated in each iteration. are communicated in each iteration. Therefore, the cost is: Therefore, the cost is:
This is not optimal! This is not optimal!
All-to-All Personalized Communication on a Hypercube: Optimal Algorithm Each node performs Each node performs p – 1 p – 1 communication steps, communication steps,
exchanging exchanging mm words of data with a different node in every words of data with a different node in every step. step.
A node must choose its communication partner in each A node must choose its communication partner in each step so that the hypercube links do not suffer congestion. step so that the hypercube links do not suffer congestion.
In the In the jjthth communication step, node communication step, node ii exchanges data with exchanges data with node node (i XOR j)(i XOR j). .
In this schedule, all paths in every communication step are In this schedule, all paths in every communication step are congestion-free, and none of the bidirectional links carry congestion-free, and none of the bidirectional links carry more than one message in the same direction. more than one message in the same direction.
Seven steps in all-to-all personalized communication on an eight-node hypercube. Seven steps in all-to-all personalized communication on an eight-node hypercube.
All-to-All Personalized Communication on a Hypercube: Optimal Algorithm
A procedure to perform all-to-all personalized communication on a A procedure to perform all-to-all personalized communication on a dd-dimensional -dimensional hypercube. The message hypercube. The message MMi,ji,j initially resides on node initially resides on node ii and is destined for node and is destined for node
jj. .
All-to-all personalized hypercube000 000 00
001001
11
010010
22
011011
33
100100
44
101101
55
110110
66
111111
77
001001
11
001001
11
000000
00
011011
33
010010
22
101101
55
100100
44
111111
77
110110
66
010010
22
010010
22
011011
33
000000
00
001001
11
110110
66
111111
77
100100
44
111111
55
011011
33
011011
33
010010
22
001001
11
000000
00
111111
77
110110
66
101101
55
100100
44
100100
44
100100
44
101101
55
110110
66
111111
77
000000
00
001001
11
010010
22
011011
33
101101
55
101101
55
100100
44
111111
77
110110
66
001001
11
000000
00
011011
33
010010
22
Etc.Etc.
node
step
Basic Communication Algorithms
Many different ways Many different ways Goal: Perform communication in least timeGoal: Perform communication in least time Depends on architectureDepends on architecture Hypercube algorithms are often provably Hypercube algorithms are often provably
optimaloptimal Even on fully connected architectureEven on fully connected architecture
Top Related