LLAMA: Efficient graph analytics using Large...
Transcript of LLAMA: Efficient graph analytics using Large...
LLAMA: Efficient graph analytics using Large Multiversioned Arrays
Presenters:Bryan Wilder, Ethan Kripke, Zach Dietz
Background
Mickens, James. “The Saddest Moment.” ;Login: Logout, (May 2013).
Big-Data Graph Analytics Applications
Big-Data Graph Analytics Applications
FriendBook
• • •
GoggleDon’t Not Not Be Not Evil™
123
Design Considerations
Compressed Sparse Row
Adjacency Lists
Compressed Sparse Bitmaps❓
❓
❓
Design Considerations
Compressed Sparse Row
Mutability
Multi-versioning
In-Memory vs. Out of Memory
Overhead
Compressed Sparse Row (CSR)
0 4 0 0 0 0 0
0 0 2 8 0 0 1
0 0 0 0 0 0 0
0 0 0 1 0 3 0
[ 4 ]
[ 2, 8, 1 ]
[ 1, 3 ]
[ ]
0 1 2 3
[ 0, 1, 4, 4, 6 ]
Compressed Sparse Row (CSR)
0 4 0 0 0 0 0
0 0 2 8 0 0 1
0 0 0 0 0 0 0
0 0 0 1 0 3 0 Value
Column
Cum. NNZ
(implicit: row)
[ 1, 2, 3, 6, 3, 5 ]
[ 4, 2, 8, 1, 1, 3 ]
Compressed Sparse Row (CSR)
0
1 2 [ 1, 2, 2, 0, 2 ]Col
[ 0, 2, 3, 5 ]NNZ
(row) 0 1 2
0 1 1
0 0 1
1 0 1
Compressed Sparse Row (CSR)
0
1 2 [ 1, 2, 2, 0, 2 ]
[ 0, 2, 3, 5 ]0 1 1
0 0 1
1 0 1
VertexTable
Edge Table
• • •
• • •
Implementation
Snapshot 0 0 1 2 3
2 3 0 2
Vertex Table
Edge TableSnapshot 0Vertex 0
Offset: 0Length: 2
Snapshot 0Vertex 1
Offset: 2Length: 1
Snapshot 0Vertex 2
Offset: 3Length: 0
Snapshot 0Vertex 3
Offset: 3Length: 1
2 3 0 2
Vertex Table
Edge Table
Blue Red Green BlueEdge Property
Tall Short Skinny ShortVertex Property
0 0 0 0Deletions
Indirection Array(pointers to pages in VT)
0-1 2-3
Page 0 Page 1
< L3 Cache
multiple of file system & virtual mem page size
Snapshot 0 0 1 2 3
2 3 0 2
Vertex Table
Edge Table
Snapshot 1 0 1
3 Continuation:NONE
Indirection Table 0-1 2-3
Page 0 Page 1 Page 0’
3 Continuation:Snapshot 0, Offset 2
0-1 2-3
Reading? Merging?
Rule for merging?
Using LLAMAExposes three operations to user:
Iterate over verticesSelect a specific vertexIterate over neighbors of a vertex
Contrast: does not enforce sequential access pattern
(unlike GraphChi or X-stream)
ExperimentalEvaluation
TasksPageRank computation
Breadth-first search
Triangle counting
TasksPageRank computation
Breadth-first search
Triangle counting
DatasetsLarge synthetic graphs (R-MAT)
Large, publicly available social graphs (Twitter, Livejournal)
TasksPageRank computation
Breadth-first search
Triangle counting
DatasetsLarge synthetic graphs (R-MAT)
Large, publicly available social graphs (Twitter, Livejournal)
In memory vs out of memory
PageRank, in-memory
In memory Out of memory
PageRank, out-of-memory
Runtime breakdown
Enormous variation!
LLAMA minimizes overhead
How is this possible? What features of LLAMA’s design essentially eliminate overhead like buffer management?
Overhead of snapshots
LLAMA
GraphChi (for reference)
PageRank Time to merge snapshots
Merges are IO-bound
Scalability in number of cores (PageRank)
Experimental evaluationCompare to in-memory and out-of-memory systems
(GreenMarl and GraphLab, GraphChi and X-stream)
Evaluate multi-version support
How much overhead from snapshots?
Evaluate scalability on different datasets
Experimental takeawaysLLAMA performs well both in and out of memory
Significantly reduces existing overhead
Introduces manageable additional overhead
Thank you!
How can we store variable length properties?
Can we create a hybrid between a row- and column-store with the storage of properties?