Graph computation
Transcript of Graph computation
![Page 1: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/1.jpg)
Graph Computation
Naveen Molleti,Sigmoid
![Page 2: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/2.jpg)
Graph of the Internet
Source: INRIA (http://raweb.inria.fr/rapportsactivite/RA2009/gravite/uid59.html)
![Page 3: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/3.jpg)
Red Hat family tree rendered along with an axis
Source: Wikimedia Commons (https://commons.wikimedia.org/wiki/File:Redhat_family_tree_11-06.png)
![Page 4: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/4.jpg)
Tabular structure Graph structure
Rows, fields, values
Vertices, edges, labels, properties
?
Graph computation
![Page 5: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/5.jpg)
Customer ID Customer Name Bill ID Item Name
391 Naveen 137 Pizza
391 Naveen 137 Coke
391 Naveen 139 Garlic Bread
393 Rahul 154 Garlic Bread
393 Rahul 154 Coke
391 Naveen 193 Coke
Table data
![Page 6: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/6.jpg)
Compute configuration
Specify type of edges to be created:
(Customer ID: CustomerName) => Bill ID
Bill ID => Item Name
![Page 7: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/7.jpg)
![Page 8: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/8.jpg)
Raw data
Ingest data Compute Insert graph
Configuration
Persistence
![Page 9: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/9.jpg)
Raw dataIngest data Compute Insert graph
Configuration
Persistence
HDFSSPARK
HDFS
TitanTinkerpop
Cassandra
![Page 10: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/10.jpg)
Graph data structures
trait Edge
{ def out: Vertex
def in: Vertex
def props: Map[String, AnyRef]
def label: String}
trait Vertex
{ def name: String
def id: String
def props: Map[String, AnyRef]}
trait Graph
{ def adjList: immutable.Map[Vertex, Seq[Edge]]}
![Page 11: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/11.jpg)
Compute
data
tokens + relations
vertices + edges
![Page 12: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/12.jpg)
Compute - simple map reduce approach
0) Split data into partitions
1) For each partition, compute tokens and relations
2) Create vertices and edges, and adjacency lists (local
subgraphs)
3) Merge adjacency lists using groupBy vertices
4) Merge duplicate edges within adjacency list
5) Result is final graph
![Page 13: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/13.jpg)
DATA
Chunk... ...
tokens relations
vertices edges
subgraph subgraph subgraphsubgraph
GRAPH
map step
reduce step
transformation step
![Page 14: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/14.jpg)
Tweaking for memory
- Maintaining vertex and edge objects is memory consuming both on application server and Spark master/workers- Moving around objects on network is costly too
Solution: Compute on ‘aliases’. Create objects corresponding to alias only before returning.
- After effects of merging duplicate objects - GC! (which opens another box of problems)Solution: Avoid all duplicate objects as far as possible.
![Page 15: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/15.jpg)
DATA
GRAPH
Chunk... ...
tokens relations
subcompute subcomputesubcompute ... ...
compute result
map step
reduce step
transformation step
![Page 16: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/16.jpg)
http://aa.bb.cc.dd:8000/graph/zzgraph/search?name=mr%20vijay&depth=2&limit=10
![Page 17: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/17.jpg)
- Xmx values on a forked JVM launched via SBT. (fork := true)
- Set javaOptions key (e.g. javaOptions := -Xmx16G)
- Underestimated size of Spark compute result
- Set spark.driver.maxResultSize
- Get the most out of your machine. Don’t let OS kill the process under memory
pressure.
- Set vm.panic_on_oom (echo 1 | sudo tee /proc/sys/vm/panic_on_oom)
Not enough memory?
![Page 18: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/18.jpg)
?
Graph
Database
![Page 19: Graph computation](https://reader031.fdocuments.us/reader031/viewer/2022022414/58711e701a28abe4448b4825/html5/thumbnails/19.jpg)
References
Titan: http://thinkaurelius.github.io/titan/Tinkerpop: http://tinkerpop.apache.org/Cassndra: http://cassandra.apache.org/