Implementing Parallel Graph Algorithms Spring 2015 Implementing Parallel Graph Algorithms Lecture 2:...
-
Upload
evan-lawrence -
Category
Documents
-
view
225 -
download
1
Transcript of Implementing Parallel Graph Algorithms Spring 2015 Implementing Parallel Graph Algorithms Lecture 2:...
Spring 2015Implementing Parallel
Graph Algorithms
Lecture 2: Introduction
Roman ManevichBen-Gurion University
2
•Graph Algorithms are Ubiquitous
Computational biology Social Networks
Computer Graphics
3
Agenda
• Operator formulation of graph algorithms• Implementation considerations for sequential
graph programs• Optimistic parallelization of graph algorithms• Introduction to the Galois system
4
Operator formulation of graph algorithms
5
Main Idea
• Define high-level abstraction of graph algorithms in terms of– Operator– Schedule– Delta
• Given a new algorithm describe it in terms of composition of these elements– Enables many implementations– Find one suitable for typical input and architecture
6
• Problem Formulation– Compute shortest distance
from source node S to every other node• Many algorithms
– Bellman-Ford (1957)– Dijkstra (1959)– Chaotic relaxation (Miranker 1969)– Delta-stepping (Meyer et al. 1998)
• Common structure– Each node has label dist
with known shortest distance from S• Key operation
– relax-edge(u,v)
Example: Single-Source Shortest-Path
2 5
1 7
A B
C
D E
F
G
S
34
22
1
9
12
2 A
C
3
if dist(A) + WAC < dist(C) dist(C) = dist(A) + WAC
7
Scheduling of relaxations:• Use priority queue of nodes,
ordered by label dist• Iterate over nodes u in priority
order• On each step: relax all
neighbors v of u – Apply relax-edge to all (u,v)
Dijkstra’s Algorithm
2 5
1 7
A B
C
D E
F
G
S
34
22
1
9
7
53
6
<C,3> <B,5><B,5> <E,6> <D,7><B,5>
8
Chaotic Relaxation
• Scheduling of relaxations:• Use unordered set of edges• Iterate over edges (u,v) in
any order• On each step:– Apply relax-edge to edge (u,v)
2 5
1 7
A B
C
D E
F
G
S
34
22
1
9
5
12
(S,A)(B,C)
(C,D)(C,E)
Q = PQueue[Node]Q.enqueue(S)
while Q ≠ {∅ u = Q.pop foreach (u,v,w) { if d(u) + w < d(v) d(v) := d(u) + w Q.enqueue(v) }
W = Set[Edge]W = (S,y) : y Nbrs(S)∪ ∈
while W ≠ {∅ (u,v) = W.get if d(u) + w < d(v) d(v) := d(u) + w foreach y Nbrs(v)∈ W.add(v,y)}
Algorithms as Scheduled Operators
9
Dijkstra-style Chaotic-Relaxation
Graph Algorithm = Operator(s) + Schedule
10
Deconstructing Schedules
What should be done
How it should be done
Unordered/Ordered algorithms
Operator Delta
Graph Algorithm
Operators Schedule
Order activity processing
Identify new activities
Static Schedule
Dynamic Schedule
Code structure(loops)
: activity
“TAO of parallelism” PLDI’11
Priority in work queue
Static
Identify new activities
Operators
Dynamic
Example
11
GraphAlgorithm
= + Schedule
Order activity processing
Dijkstra-style Chaotic-Relaxation
Q = PQueue[Node]Q.enqueue(S)
while Q ≠ {∅ u = Q.pop foreach (u,v,w) { if d(u) + w < d(v) d(v) := d(u) + w Q.enqueue(v) }
W = Set[Edge]W = (S,y) : y Nbrs(S)∪ ∈
while W ≠ {∅ (u,v) = W.get if d(u) + w < d(v) d(v) := d(u) + w foreach y Nbrs(v)∈ W.add(v,y)}
12
SSSP in Elixir
Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd)
edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule≫
Graph type
Operator
FixpointStatement
13
Operators
Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule ≫
Redex pattern
GuardUpdate
ba if bd > ad + w
adw
bd
ba
adw
ad+w
14
Fixpoint Statement
Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule ≫
Apply operator until fixpoint
Scheduling expression
15
Scheduling Examples
Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule ≫
Locality enhanced Label-correctinggroup b unroll 2 approx metric ad ≫ ≫Dijkstra-style
metric ad group b ≫
q = new PrQueueq.enqueue(SRC)while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) } }}
16
Implementation considerations for sequential
graph programs
17
Parallel Graph Algorithm
Operators Schedule
Order activity processing
Identify new activities
Static Schedule
Dynamic Schedule
Operator Delta Inference
18
Finding the Operator delta
19
Problem Statement
• Many graph programs have the formuntil no change do { apply operator}
• Naïve implementation: keep looking for places where operator can be applied to make a change– Problem: too slow
• Incremental implementation: after applying an operator, find smallest set of future active elements and schedule them (add to worklist)
20
Identifying the Delta of an Operator
b
a
relax1
??
21
Delta Inference Example
ba
SMT Solver
SMT Solver
assume (da + w1 < db)
assume ¬(dc + w2 < db)
db_post = da + w1
assert ¬(dc + w2 < db_post)Query Program
relax1
c
w2
w1
relax2
(c,b) does not become active
22
assume (da + w1 < db)
assume ¬(db + w2 < dc)
db_post = da + w1
assert ¬(db_post + w2 < dc)Query Program
Delta Inference Example – Active
SMT Solver
SMT Solver
ba
relax1
cw1
relax2
w2
Apply relax on all outgoing edges (b,c) such that:
dc > db +w2
and c a≄
23
Influence Patterns
b=cad
ba=c
d
a=dc
b
b=da=c b=ca=d
b=da
c
24
Implementing the operator
Example: Triangle Counting• How many triangles exist in a graph– Or for each node
• Useful for estimating the community structure of a network
25
Triangles Pseudo-code
26
for a : nodes do
for b : nodes do
for c : nodes do
if edges(a,b)
if edges(b,c)
if edges(c,a)
if a < b
if b < c
if a < c
triangles++
fi
…
…
…
Example: Triangles
27
for a : nodes do
for b : nodes do
for c : nodes do
if edges(a,b)
if edges(b,c)
if edges(c,a)
if a < b
if b < c
if a < c
triangles++fi
…
≺ ≺
Iterators
Graph Conditions
Scalar Conditions
28
for a : nodes do
for b : nodes do
for c : nodes do
if edges(a,b)
if edges(b,c)
if edges(c,a)
if a < b
if b < c
if a < c
triangles++fi
…
≺ ≺
Triangles: Reordering
Iterators
Graph Conditions
Scalar Conditions
29
for a : nodes do
for b : nodes do
for c : nodes do
if edges(a,b)
if edges(b,c)
if edges(c,a)
if a < b
if b < c
if a < c
triangles++fi
…
≺ ≺
for a : nodes do
for b : Succ(a) do
for c : Succ(b) do
if edges(c,a)
if a < b
if b < c
if a < c
triangles++fi
…
Triangles: Implementation Selection
for x : nodes doif edges(x,y)
⇩for x : Succ(y) do
Reordering+ImplementationSelection
Tile:
Iterators
Graph Conditions
Scalar Conditions
30
Optimistic parallelization of graph programs
Parallelism is Everywhere
Texas AdvancedComputing Center
Cell-phones
Laptops
32
Example: Boruvka’s algorithms for MST
33
Minimum Spanning Tree Problem
c d
a b
e f
g
2 4
6
5
3
7
4
1
34
Minimum Spanning Tree Problem
c d
a b
e f
g
2 4
6
5
3
7
4
1
35
Boruvka’s Minimum Spanning Tree Algorithm
Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node
c d
a b
e f
g
2 4
6
5
3
7
4
1d
a,c b
e f
g
4
6
3
4
17
lt
36
Parallelism in Boruvka
c d
a b
e f
g
2 4
6
5
3
7
4
1
Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node
37
Non-conflicting Iterations
c d
a b
2
5
3
7
4
1
Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node
e f
g
4
6
38
Non-conflicting Iterations
Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node
d
a,c b3
4
17
e f,g6
39
Conflicting Iterations
c d
a b
e f
g
2 4
6
5
3
7
4
1
Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node
40
Optimistic parallelization of graph algorithms
41
How to parallelize graph algorithms
• The TAO of Parallelism in Graph Algorithms / PLDI 2011
• Optimistic parallelization• Implemented by the Galois system
Operator Formulation of Algorithms• Active element
– Site where computation is needed
• Operator– Computation at active element– Activity: application of operator to active
element
• Neighborhood– Set of nodes/edges read/written by activity– Distinct usually from neighbors in graph
• Ordering : scheduling constraints on execution order of activities– Unordered algorithms: no semantic
constraints but performance may depend on schedule
– Ordered algorithms: problem-dependent order
• Amorphous data-parallelism– Multiple active elements can be processed in
parallel subject to neighborhood and ordering constraints
:active node
:neighborhood
Parallel program = Operator + Schedule + Parallel data structure
What is that?Who implements it?
43
Optimistic Parallelization in Galois• Programming model
– Client code has sequential semantics– Library of concurrent data structures
• Parallel execution model– Activities executed speculatively
• Runtime conflict detection– Each node/edge has associated exclusive
lock– Graph operations acquire locks on
read/written nodes/edges– Lock owned by another thread conflict
iteration rolled back– All locks released at the end
• Runtime book-keeping(source of overhead)– Locking– Undo actions
i1
i2
i3
44
Avoiding rollbacks
45
Cautious Operators• When an iteration aborts before completing its work we
need to undo all of its changes– Log each change to the graph and upon abort apply reverse
actions in reverse order– Expensive to maintain– Not supported by Galois systems for C++
• How can we avoid maintaining rollback data?• An operator is cautious if it never performs changes
before acquiring all locks– In this case upon abort there are no changes to be undone– Can ensure operator is cautious by adding code to acquire
locks before making any changes
46
Failsafe Points
Lockset Grows
Lockset Stable
Failsafe
…
foreach (Node a : wl){
…
…
}
foreach (Node a : wl) } Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);{
Program point P is failsafe if: For every future program point Q – the locks set in Q is already contained in the locks set of P: Q : Reaches(P,Q) Locks(Q) ACQ(P)
47
Is this Code Cautious?
Lockset Grows
Lockset Stable
Failsafe
…
foreach (Node a : wl) } Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);{
No
lta
48
Rewrite as Cautious Operator
Lockset Grows
Lockset Stable
Failsafe
…
foreach (Node a : wl) } Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.neighbors(lt); g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);{
lta
49
So far• Operator formulation of
graph algorithms• Implementation
considerations for sequential graph programs
• Optimistic parallelization of graph algorithms
• Introduction to the Galois system
50
Next steps
• Divide into groups• Algorithm proposal– Due date: 15/4– Phrase algorithm in terms of operator formulation– Define delta if necessary– Submit proposal with description of algorithm +
pseudo-code– LaTeX template will be on web-site soon
• Lecture on 15/4 on implementing your algorithm via Galois