doug
-
Upload
webuploader -
Category
Technology
-
view
111 -
download
3
description
Transcript of doug
1
Large-Scale Network Analysis with the Boost Graph Libraries
Douglas GregorOpen Systems LabIndiana University
2
What are the BGLs? A collection of libraries for computation on
graphs/networks. Graph data structures Graph algorithms Graph input/output
Common design Flexibility/customizability throughout Obsessed with performance Common interfaces throughout the collection
All open source, freely available onlineIntro
3
The BGL Family
The Original (sequential) BGL
BGL-Python
The Parallel BGL
Parallel BGL-Python
Intro
4
The Original BGL The largest and most mature BGL
~7 years of research and development Many users, contributors outside of the OSL Steadily evolving
Written in C++ Generic Highly customizable Efficient (both storage and execution)
Intro BGL
5
BGL: Graph Data Structures Graphs:
adjacency_list: highly configurable with user-specified containers for vertices and edges
adjacency_matrix compressed_sparse_row
Adaptors: subgraphs, filtered graphs, reverse graphs LEDA and Stanford GraphBase
Or, use your own…
Intro BGL
6
Original BGL: Algorithms Searches (breadth-first,
depth-first, A*) Single-source shortest
paths (Dijkstra, Bellman-Ford, DAG)
All-pairs shortest paths (Johnson, Floyd-Warshall)
Minimum spanning tree (Kruskal, Prim)
Components (connected, strongly connected, biconnected)
Maximum cardinality matching
Max-flow (Edmonds-Karp, push-relabel)
Sparse matrix ordering (Cuthill-McKee, King, Sloan, minimum degree)
Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun)
Betweenness centrality PageRank Isomorphism Vertex coloring Transitive closure Dominator tree
Intro BGL
7
Task: Biconnected Components
Input Graph Output Graph
Articulation points: B G A
Intro BGL
8
Define a Graph Type Determine vertex/edge properties:
struct Vertex { string name; };struct Edge { int bicomponent; };
Determine the graph type:typedef adjacency_list< /*EdgeListS=*/ vecS, /*VertexListS=*/ vecS, /*DirectedS=*/ undirectedS, /*VertexProperty=*/ Vertex, /*EdgeProperty=*/ Edge> Graph;
Intro BGL
9
Read in a GraphViz DOT File Build an empty graph:
Graph g;
Map vertex properties:dynamic_properties dyn;dyn.property(“node_id”, get(&Vertex::name, g));
Read in the GraphViz graph:ifstream in(“biconnected_components.dot”);read_graphviz(in, g, dyn);
Intro BGL
10
Run Biconnected Components Keep track of the articulation points:
vector<Graph::vertex_descriptor> art_points;
Compute biconnected components:biconnected_components (g, get(&Edge::bicomponent, g), back_inserter(art_points));
Intro BGL
11
Output results Attach bicomponent number to the “label” property
of edges:dyn.property(“label”, get(&Edge::bicomponent, g));
Write results to another GraphViz file:ofstream out(“bc_out.dot”);write_graphviz(out, g, dyn);
Show articulation points:cout << “Articulation points: “;for (int i = 0;i < art_points.size(); ++i) { cout << g[art_points[i]].name << ‘ ‘;}
Intro BGL
12
Task: Biconnected Components
Input Graph Output Graph
Articulation points: B G A
Intro BGL
13
Original BGL Summary The original BGL is large, stable, efficient
Lots of algorithms, graph types Peer-reviewed code with many users, nightly
regression testing, etc. Performance comparable to FORTRAN.
Who should use the BGL? Programmers comfortable with C++ Users with graph sizes from tens of vertices to
millions of vertices
Intro BGL
14
BGL-Python Python is ideal for rapid prototyping:
It’s a scripting language (no compiler) Dynamically typed means less typing for you Easy to use: you already know Python…
BGL-Python provides access to the BGL from within Python Similar interfaces to C++ BGL Easier to learn than C++ Great for scripting, GUI applications help(bgl.dijkstra_shortest_paths)
Intro BGL Python
15
Example: Biconnected Components
import boost.graph as bgl # Pull in the BGL bindingsg = bgl.Graph.read_graphviz("biconnected_components.dot")
# Compute biconnected components and articulation pointsbicomponent = g.edge_property_map(‘int’)art_points = bgl.biconnected_components(g, bicomponent);
# Save results with bicomponent numbers as edge labelsg.edge_properties[‘label’] = bicomponentg.write_graphviz("biconnected_components_out.dot")
print "Articulation points: ",node_id = g.vertex_properties[‘node_id’]for v in art_points: print node_id[v],’ ’,print ""
Intro BGL Python
16
Wrapping the BGL in Python BGL-Python is not a…
“port” reimplementation
BGL-Python wraps the C++ BGL Python calls translate to C+
+ calls C++ can call back into
Python Most of the speed of C++ Most of the flexibility of
Python
17
Performance: Shortest Paths
0
5
10
15
20
25
30
Seconds
BGL Dijkstra BGL Dijkstra withPython Visitor
Python Dijkstra
Intro BGL Python
18
BGL-Python Summary BGL-Python is all about tradeoffs:
More gradual learning curve Faster time-to-solution Lower performance
Our typical approach:1. Prototype in Python to get your ideas down2. Port to C++ when performance matters
Intro BGL Python
19
20
The Parallel BGL A version of the C++ BGL
for computational clusters Distributed memory for huge
graphs Parallel processing for
improved performance An active research project Closely related to the
original BGL Parallelizing BGL programs
should be “easy”
Intro BGL ParallelPython
21
Parallel BGL: Distributed Graphs
A simple, directed graph… distributed across 3 processors.
Intro BGL ParallelPython
22
Parallel Graph Algorithms Breadth-first search Eager Dijkstra’s
single-source shortest paths
Crauser et al. single-source shortest paths
Depth-first search Minimum spanning
tree (Boruvka, Dehne & Götz)
Connected components
Strongly connected components
Biconnected components
PageRank Graph coloring Fruchterman-Reingold
layout Max-flow (Dinic’s)
Intro BGL ParallelPython
23
Performance: Sparse graphs
1
10
100
1000
1 10 100
# of Processors
Wall Clock Time (seconds)
Breadth-First SearchCrauser et al.Eager Dijkstra 0.1Dense BoruvkaMerging Local MSFsBoruvka-Then-MergeBoruvka-Mixed-MergeBoman et al Coloring
24
Scalability (~547k vertices/node)
0
50
100
150
200
250
300
350
400
0 50 100 150
# of Processors
Wall Clock Time (seconds)
Breadth-First Search
Crauser et al. ShortestPathsEager Dijkstra ShortestPathsConnected Components
Vertex Coloring
Up to 70M Vertices1B EdgesSmall-World Graph
25
Performance vs. CGMgraph
96k vertices10M edgesErdos-Renyi
17x
30x
Intro BGL ParallelPython
26
Parallel BGL Summary The Parallel BGL is built for huge graphs
Millions to hundreds of millions of nodes Distributed-memory parallel processing on
clusters Future work will permit larger graphs…
Parallel programming has a learning curve Parallel graph algorithms much harder to write Distributed graph manipulation can be tricky
Parallel BGL is an active research library
Intro BGL ParallelPython
27
Distributed Graph Layout
Intro BGL ParallelPython
28
Parallel BGL in Python Preliminary support for the Parallel BGL in
Python Just import boost.graph.distributed Similar interface to sequential BGL-Python
Several options for usage with MPI: Straight MPI: mpirun -np 2 python script.py pyMPI: allows interactive use of the interpreter
Initially used to prototype our distributed Fruchterman-Reingold implementation.
Intro BGL ParallelPython
29
Porting for Performance
Intro BGL ParallelPython Porting
30
Which BGL is Right for You? Is any BGL right for you? Depends on how large your networks are:
Up to 1/2 million vertices, any BGL will do C++ BGL can push to a couple million vertices For tens of millions or larger, Parallel BGL only
Other considerations: You can prototype in Python, port to C++ Algorithm authors might prefer the original BGL Parallelism is very hard to manage
Intro BGL ParallelPython Porting
31
Conclusion The Boost Graph Library family is a
collection of full-featured graph libraries All are flexible, customizable, efficient Easy to port from Python to C++ Can port from sequential to parallel Always growing, improving
Is one of the BGLs right for you? A typical “build or buy” decision
Intro BGL ParallelPython Porting Conclusion
32
For More Information… (Original) Boost Graph Library
http://www.boost.org/libs/graph/doc Parallel Boost Graph Library
http://www.osl.iu.edu/research/pbgl Python Bindings for (Parallel) BGL
http://www.osl.iu.edu/~dgregor/bgl-python Contact us!
Douglas Gregor <[email protected]> Andrew Lumsdaine <[email protected]>
Intro BGL ParallelPython Porting Conclusion
33
Other BGL Variants QuickGraph (C#)
http://www.codeproject.com/cs/miscctrl/quickgraph.asp Ruby Graph Library
http://rubyforge.org/projects/rgl/ Rooster Graph (Scheme)
http://savannah.nongnu.org/projects/rgraph/ RBGL (an R interface to the C++ BGL)
http://www.bioconductor.org/packages/bioc/1.8/html/RBGL.html
Disclaimer: These are all separate projects. We do not maintain them.
Intro BGL ParallelPython Porting
34
Comparative Performance
BC Clustering Performance BGL vs. JUNG
0
10
20
30
40
50
60
200 225 250 275 300 325 350 375 400
# of Movies
Wall clock time (minutes)
BGL JUNG
Intro BGL