cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms...

30
cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs Oded Green & David Bader

Transcript of cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms...

Page 1: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

cuSTINGER - Supporting Dynamic Graph

Algorithms for GPUs

Oded Green & David Bader

Page 2: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

What we will see today

• The first dynamic graph data structure for the GPU.

– Scalable in size

– Supports the same functionality is its CPU

counterpart

• Supports extremely fast update rates.

• Good performance for static graph algorithms.

Oded Green, HPEC'16 2

Page 3: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Big Data problems

need Graph Analysis

Communication networks:

• World-wide connectivity

• High velocity changes

• Different types of extracted

data:– Physical communication network.

– Person-to-person communication network.

Oded Green, HPEC'16 3

Health-Care networks:

• Various players.

• Pattern matching and

epidemic monitoring.

• Problem sizes have

doubled in last 5 years.

Financial networks:

• Transactions between

players.

• Different transactions

types (property graph)

Graphs are a unifying motif for data analytics.

More importantly are dynamic and streaming graphs!

Page 4: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Definitions

• STINGER: Spatio-Temporal Interaction

Networks and Graphs (STING) Extensible

Representation

• Dynamic graphs

– Graph can change over time.

– Changes can be to topology, edges, or vertices.

• For example new edges between two vertices.

• Streaming graphs:

– Graphs changing at high rates.

– 100s of thousands of updates per second.

Oded Green, HPEC'16 4

Page 5: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Streaming graph example

• Only a subset of the entire graph…

• Dynamic/Streaming:

– At time �:

• � and � become friends.

• �����_������, ��

– At time �̂:

• � and� nolongerfriends

• d�����_���� �, �

Oded Green, HPEC'16 5

����

��

Page 6: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

STING Extensible Representation

• Semi-dense edge list blocks with free space

• Supports property graphs (vertex & edge type, vertex & edge weights, time-stamps, and more).

• Maps from application IDs to storage IDs

Oded Green, HPEC'16 6

Page 7: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

STINGER

• Enable algorithm designers to implement

dynamic & streaming graph algorithms with

ease.

• Portable semantics for various platforms

– Linked list of edge blocks not ideal for the GPU

• Good performance for all types of graph

problems and algorithms - static and dynamic.

• Assumes globally addressable memory access

Oded Green, HPEC'16 7

Page 8: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

STINGER and cuSTINGER Properties

� A Simple programming model

� Millions of updates per second to graph

�Updates are not bottlenecks for analytics.

�Hundreds of thousands of updates per second for

numerous analytics.W

� Advanced memory manager

� Transfers data between host and device automatically

� Reduces initialization time

� Allows for simple update processes

Main Papers: [Bader et al.; 2007; Tech Report]

[Ediger et al.; HPEC; 2012], [McColl et al.; PPAA; 2014]

Oded Green, HPEC'16 8

Page 9: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Lots of great graph libraries

CPU-based

• Galois

• Ligra

• LLAMA

• STINGER

– DISTINGER

Oded Green, HPEC'16 9

GPU-based

• Gunrock

• GasCL

• BelRed

• BlazeGraph

Most of these target STATIC graphs and use CSR

Page 10: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Compressed Sparse Row (CSR)

Pros:

• Uses precise storage requirements

• Great locality– Good for GPUs

• Handful of arrays– Simple to use and

manage

Cons:

• Inflexible.

• Network growth unsupported

• Topology changes unsupported

• Property graphs not supported

Oded Green, HPEC'16 10

Vertex Weight:

Offset:

0 1 2 3 #

Destination:

Edge Weight:

$

Legend: Optional Field Mandatory

Field

Page 11: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

cuSTINGER – Data Structure

• Great locality– STINGER uses an

Array of Structures

(AOS)

– cuSTINGER uses a

Structure of Arrays

(SOA)

• Each vertex has its own adjacency list

• Can compact data similar to CSR.

Oded Green, HPEC'16 11

Legend: Optional Field Mandatory

Field

0 1 2 3 V

Destination:

Used

Used:

Allocated:

Pointer:

Page 12: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

cuSTINGER – Supports Growth

• Great locality

• Supports updates– Supports edge insertion

and deletion

– Supports vertex

insertion and deletion

Oded Green, HPEC'16 12

Legend: Optional Field Mandatory

Field

Used:

Allocated:

Pointer:

0 1 2 3 V #&

Destination:

Allocated

Used

Page 13: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

cuSTINGER – Allocation modes

• Great locality

• Supports updates– Supports edge insertion

and deletion

– Supports vertex

insertion and deletion

• Supports multiple allocation modes

– Runtime configurable

Oded Green, HPEC'16 13

Legend: Optional Field Mandatory

Field

Used:

Allocated:

Pointer:

0 1 2 3 V #&

Destination:

Allocated

Used

Destination:

Allocated

Used

Option 1:

Option 2:

Page 14: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Used:

Allocated:

Vertex Weight:

Vertex Type:

Pointer:

0 1 2 3 V

Destination:

Edge Weight:

Edge Type:

Time Stamp 1:

Time Stamp 2:

cuSTINGER – Supports Properties

• Great locality

• Supports updates– Supports edge insertion

and deletion

– Supports vertex

insertion and deletion

• Supports multiple allocation modes

• Supports STINGER properties

Oded Green, HPEC'16 14

Legend: Optional Field Mandatory

Field

Allocated

Used

Page 15: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Edge Insertions

• Given an edge update, � =

�()* , �+,(- :

– Check that edge doesn’t already exist

– Check for available space

– Increment “used” and append to end

– Adjacency list is not sorted

• Updates are done in batches

– Better utilization

– Requires identifying two identical edges in a batch.

Oded Green, HPEC'16 15

�()*

Destination:

Edge Weight:

Edge Type:

Time Stamp 1:

Time Stamp 2:

Legend: Optional Field Mandatory

Field

Allocated

Used

Page 16: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Edge Insertions –

Out of Memory

• Given an edge update, � =

�()* , �+,(-

• Adjacency list is full

• Allocate new list

• Copy old list into new list

• Append to end

Oded Green, HPEC'16 16

Destination:

Edge Weight:

Edge Type:

Time Stamp 1:

Time Stamp 2:

Legend: Optional Field Mandatory

Field

Allocated

Used

�()*

Destination:

Edge Weight:

Edge Type:

Time Stamp 1:

Time Stamp 2:

Allocated

Used

Page 17: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Experiment Setup

• NVIDIA K40 GPU

– Kepler micro-architecture

– 15 SMs, total of 2880 SPs

– 12GB of RAM

• Intel i7-4770K

– Haswell micro-architecture

– Quad core

– 8MB L3 cache

– 32GB of RAM

Oded Green, HPEC'16 17

Page 18: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Inputs Graphs

• DIMACS 10 Graph Implementation Challenge

• SNAP – Stanford Network Analysis Project

Oded Green, HPEC'16 18

Name Type |/| |0| Source

123��42�5678 Collaboration 299: 1.95= DIMACS

>� − �:���� Trace route 1.69= 11.1= SNAP

:2�_21 Random 2= 201= DIMACS

1�� − A>����� Citation 3.77= 16.5= SNAP

1>��15 Matrix 5.15= 94= DIMACS

�: − 2002 Webcrawl 18.52= 523= DIMACS

Page 19: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Experiment metrics

• Initialization time

– Preferably as small as possible

• Update rate

– Number of updates per second that

cuSTINGER can sustain

• Static graph support

– We compare a clustering-coefficient

implementation using CSR with a

CUSTINGER implementation

Oded Green, HPEC'16 19

Page 20: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Initialization Time

• Time correlated with number of vertices

Oded Green, HPEC'16 20

Page 21: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Update rate – Small Batches

• Updating a single edge

at a time:

– 15F updates per second

– Same rate for insertions and deletions

• For small batches

– Upto 1000 edges per batch

– Millions of updates per second

Oded Green, HPEC'16 21

Page 22: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Insertion rate – Large Batches

• Increase chance of

vertex not having

enough storage

available.

• Some structures are

copied back from device

to host

– Overhead is big for mid-size batches.

– Overhead “���>AA�>�” for larger batches.

Oded Green, HPEC'16 22

Page 23: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Deletion rate – Large Batches

• Performance is

consistent for all graphs

and unique batches.

• No memory allocation or

de-allocation are

required.

– Unlike for the insertions case.

• Currently, memory

reclamation is not

supported.

Oded Green, HPEC'16 23

Page 24: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Triangle Counting – Static Graph

• Algorithm taken from [Green et al; I3J;2014]

• Simply replace CSR accesses with cuSTINGER

• Execution times are similar

Oded Green, HPEC'16 24

Name |/| |0| Time-CSR

(sec.)

Time-

cuSTINGER

(sec.)

Execution

Difference

123��42�5678 299: 1.95= 0.218 0.242 +10%

>� − �:���� 1.69= 11.1= 57.14 59.37 +3.8%

:2�_21 2= 201= 2992 2996 +0.14%

1�� − A>����� 3.77= 16.5= 0.814 0.830 +2%

1>��15 5.15= 94= 6.544 7.204 +10%

�: − 2002 18.52= 523= 424.9 431.4 +1.6%

Page 25: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Summary

• cuSTINGER supports high update rates

• Memory manager

– Responsible for allocating and transferring

data on/from device

– Reduces initialization time

– Programmers can focus on algorithms instead

of complex data management

• Great performance for both dynamic and static graph algorithms

Oded Green, HPEC'16 25

Page 26: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Acknowledgments

• Devavret Makkar, Graduate Student (Georgia Tech)

Oded Green, HPEC'16 26

Page 27: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Acknowledgment of Support

Oded Green, HPEC'16 27

Page 28: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Thank you

Oded Green, HPEC'16 28

• Email: [email protected]

• STINGER:–Documentation: http://stingergraph.com/

• cuSTINGER

–Coming soon…

Page 29: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Backup Slides

Oded Green, HPEC'16 29

Page 30: cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs · dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks

Array of Structures Vs.

Structure of Arrays

Oded Green, HPEC'16 30

STINGER (AOS) cuSTINGER

(SOA)Used:

Allocated:

Vertex Weight:

Vertex Type:

Pointer:

0 1 2 3 V

90° degrees