LLAMA: Efficient graph analytics using Large...

28
LLAMA: Efficient graph analytics using Large Multiversioned Arrays Presenters: Bryan Wilder, Ethan Kripke, Zach Dietz

Transcript of LLAMA: Efficient graph analytics using Large...

Page 1: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

LLAMA: Efficient graph analytics using Large Multiversioned Arrays

Presenters:Bryan Wilder, Ethan Kripke, Zach Dietz

Page 2: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Background

Page 3: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Mickens, James. “The Saddest Moment.” ;Login: Logout, (May 2013).

Big-Data Graph Analytics Applications

Page 4: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Big-Data Graph Analytics Applications

FriendBook

• • •

GoggleDon’t Not Not Be Not Evil™

123

Page 5: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Design Considerations

Compressed Sparse Row

Adjacency Lists

Compressed Sparse Bitmaps❓

Page 6: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Design Considerations

Compressed Sparse Row

Mutability

Multi-versioning

In-Memory vs. Out of Memory

Overhead

Page 7: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Compressed Sparse Row (CSR)

0 4 0 0 0 0 0

0 0 2 8 0 0 1

0 0 0 0 0 0 0

0 0 0 1 0 3 0

[ 4 ]

[ 2, 8, 1 ]

[ 1, 3 ]

[ ]

Page 8: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

0 1 2 3

[ 0, 1, 4, 4, 6 ]

Compressed Sparse Row (CSR)

0 4 0 0 0 0 0

0 0 2 8 0 0 1

0 0 0 0 0 0 0

0 0 0 1 0 3 0 Value

Column

Cum. NNZ

(implicit: row)

[ 1, 2, 3, 6, 3, 5 ]

[ 4, 2, 8, 1, 1, 3 ]

Page 9: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Compressed Sparse Row (CSR)

0

1 2 [ 1, 2, 2, 0, 2 ]Col

[ 0, 2, 3, 5 ]NNZ

(row) 0 1 2

0 1 1

0 0 1

1 0 1

Page 10: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Compressed Sparse Row (CSR)

0

1 2 [ 1, 2, 2, 0, 2 ]

[ 0, 2, 3, 5 ]0 1 1

0 0 1

1 0 1

VertexTable

Edge Table

• • •

• • •

Page 11: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Implementation

Page 12: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Snapshot 0 0 1 2 3

2 3 0 2

Vertex Table

Edge TableSnapshot 0Vertex 0

Offset: 0Length: 2

Snapshot 0Vertex 1

Offset: 2Length: 1

Snapshot 0Vertex 2

Offset: 3Length: 0

Snapshot 0Vertex 3

Offset: 3Length: 1

2 3 0 2

Vertex Table

Edge Table

Blue Red Green BlueEdge Property

Tall Short Skinny ShortVertex Property

0 0 0 0Deletions

Indirection Array(pointers to pages in VT)

0-1 2-3

Page 0 Page 1

< L3 Cache

multiple of file system & virtual mem page size

Page 13: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Snapshot 0 0 1 2 3

2 3 0 2

Vertex Table

Edge Table

Snapshot 1 0 1

3 Continuation:NONE

Indirection Table 0-1 2-3

Page 0 Page 1 Page 0’

3 Continuation:Snapshot 0, Offset 2

0-1 2-3

Reading? Merging?

Rule for merging?

Page 14: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Using LLAMAExposes three operations to user:

Iterate over verticesSelect a specific vertexIterate over neighbors of a vertex

Contrast: does not enforce sequential access pattern

(unlike GraphChi or X-stream)

Page 15: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

ExperimentalEvaluation

Page 16: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

TasksPageRank computation

Breadth-first search

Triangle counting

Page 17: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

TasksPageRank computation

Breadth-first search

Triangle counting

DatasetsLarge synthetic graphs (R-MAT)

Large, publicly available social graphs (Twitter, Livejournal)

Page 18: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

TasksPageRank computation

Breadth-first search

Triangle counting

DatasetsLarge synthetic graphs (R-MAT)

Large, publicly available social graphs (Twitter, Livejournal)

In memory vs out of memory

Page 19: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

PageRank, in-memory

In memory Out of memory

Page 20: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

PageRank, out-of-memory

Page 21: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Runtime breakdown

Enormous variation!

LLAMA minimizes overhead

Page 22: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

How is this possible? What features of LLAMA’s design essentially eliminate overhead like buffer management?

Page 23: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Overhead of snapshots

LLAMA

GraphChi (for reference)

PageRank Time to merge snapshots

Merges are IO-bound

Page 24: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Scalability in number of cores (PageRank)

Page 25: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Experimental evaluationCompare to in-memory and out-of-memory systems

(GreenMarl and GraphLab, GraphChi and X-stream)

Evaluate multi-version support

How much overhead from snapshots?

Evaluate scalability on different datasets

Page 26: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Experimental takeawaysLLAMA performs well both in and out of memory

Significantly reduces existing overhead

Introduces manageable additional overhead

Page 27: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

Thank you!

Page 28: LLAMA: Efficient graph analytics using Large ...daslab.seas.harvard.edu/classes/cs265/files/presentations/2020/CS2… · LLAMA: Efficient graph analytics using Large Multiversioned

How can we store variable length properties?

Can we create a hybrid between a row- and column-store with the storage of properties?