How well do CPU, GPU and Hybrid Graph Processing ...

Post on 13-Nov-2021

10 views 0 download

Transcript of How well do CPU, GPU and Hybrid Graph Processing ...

HowwelldoCPU,GPUandHybridGraph

ProcessingFrameworksPerform?

TanujKrAasawat,TahsinReza,MateiRipeanuNetworkedSystemsLaboratory(NetSysLab)UniversityofBritishColumbia

NetworkedSystemsLaboratory(NetSysLab)UniversityofBritishColumbia

Agolfcourse…

…a(nudist)beach

(…and199daysofraineachyear)

Graphs are Everywhere

4

1B users 150B friendships

100B neurons 700T connections

Challenges in Graph Processing

Data-dependentmemory

accesspatterns

Largememoryfootprint

Poorlocality

Lowcompute-to-memoryaccessratio

Varyingdegreesofparallelism(bothintra-andinter-stage)

Graph500“mini”graphrequires128GB.

Processing Elements Characteristics

Data-dependentmemory

accesspatternsLargeCaches

Largememoryfootprint >1TB

CPUs

Poorlocality

Massivehardwaremultithreading

~16GB

GPUs

Lowcompute-to-memoryaccessratio

Caches

Varyingdegreesofparallelism(bothintra-andinter-stage)

Graph500“mini”graphrequires128GB.

Assemble a hybrid platform?

Graph Processing Frameworks

ProgrammingModel

(VertexProgramming/LinearAlgebra)

Architecture(Single-nodeorDistributed)

HighPerformance

CPU/GPU/Hybrid

Motivation

Howarchitectureandprogrammingmodelcombinationimprovesperformanceandefficiencyofthesystemasawhole?

Graph Processing Frameworks Architecture Model Programming

Model Vertex

Programming CPU

CPU+Distributed LinearAlgebra

VertexProgramming

Multi-GPU

GPU LinearAlgebra

Framework

GaloisUTexas,Austin

GraphMatIntel

GunrockUC,Davis

NvgraphNvidia

TotemUBC

CPU+multi-GPU VertexProgramming

Benchmark Algorithms

• PageRank•  Rankingwebpages•  Computeintensive

•  SingleSourceShortestPaths(SSSP)•  IProuting,Transportationnetworks

• Breadth-FirstSearch(BFS)•  Findingconnectedcomponent,subroutine•  Memoryintensive

Evaluation Metrics

§ RawPerformance§  TraversedEdgesPerSecond(TEPS):TraversedEdges/ExecutionTime

§ EnergyConsumption§ AveragePowerconsumed*ExecutionTime

§ Scalability§  Strongscalingw.r.tprocessingunits

Testbed Characteristics System1

CPU 2xIntelXeonE5-2695v3(Haswell)

#CPUCores 28

HostMemory 512GBDDR4

L3Cache 70MB

PCIe 3.0–x16

GPU 2xNvidiaTeslaK40c

GPUThreadCount

2880

GPUMemory 12GB

Datasets Graph #Vertices #Edges MaxDegree Avg.Degree

RealWorld

Com-Orkut 3M 234M 33,313 78

liveJournal 4.8M 68M 20,292 14

Road-USA 28.8M 47.9M 9 1.6

Twitter 52M 3.9B 3,691,240 75

Synthetic

RMAT22 4M 128M 168,729 32

RMAT23 8M 256M 272,808 32

RMAT24 16M 512M 439,994 32

RMAT27 128M 4B 3,910,241 32

WDC,2012

Memory Consumption

Framework Memorylayout PageRank SSSP BFS

Nvgraph CSC(PageRank,SSSP)andCSR(BFS)

1,159(1.8x) 1,111(1.0x) 683(1.0x)

Gunrock CSRandCOO 641(1.0x) 1,582(1.4x) 1,443(2.1x)

Galois CSR 1,599(2.5x) 2,074(1.9x) 1,432(2.1x)

GraphMat* DCSC 2,818(4.4x) 2,786(2.5x) 2,980(4.4x)

Totem-2S CSR 1,275(2.0x) 2,198(2.0x) 1,282(1.9x)

Totem-2S2G CSR 1,628(2.5x) 2,587(2.3x) 1,658(2.4x)

MemoryConsumption(inMB)forRMAT22graph(edgelistsize:512MB)

9,354MBduringpre-processing

step

Experimental Results 1. Raw Performance - PageRank

02468

1012141618

Orkut LiveJournal RMAT22 RMAT23 RMAT24 RMAT27 Twitter

Billion

TEPS/Iteratio

nNvgraph Gunrock Totem-1G GaloisGraphMat Totem-2S Totem-2S2G

Fastest:Totem-2SNvgraphvsGraphMat

Experimental Results 1. Raw Performance - SSSP

0.000.501.001.502.002.503.003.504.004.50

Orkut LiveJournalRoad_USA RMAT22 RMAT24 RMAT27 Twitter

Billion

TEPS

Nvgraph Gunrock Totem-1G GaloisGraphMat Totem-2S Totem-2S2G

Fastest:Totem-2SCSCissuitableforPageRank

20

4 3

1 0 1 3 3 6 80 1 2 3 4 5*

0 2 3 6 7 80 1 2 3 4 5*

1 2 3 0 2 4 0 20 1 2 3 4 5 6 7

3 4 0 1 3 4 1 30 1 2 3 4 5 6 7

CSRRepresentation

CSCRepresentation

rowPtrVertexId

colPtr

edgeList

VertexId

edgeList

GraphLayoutinMemory

Experimental Results 1. Raw Performance - BFS

0

20

40

60

80

100

120

Orkut LiveJournal RMAT22 RMAT24 RMAT27 Twitter

Billion

TEPS

Nvgraph Gunrock Totem-1G GaloisGraphMat Totem-2S Totem-2S2G

Fastest:Totem-2SNvgraphvsGraphMatCSRsuitableforBFS

Hybrid:~2x

Experimental Results 2. Energy Consumption – GPU Frameworks – Orkut Workload

1

10

100

1,000

Nvgraph

Gunrock

Totem-1G

Totem-2S

Totem-2S2G

Nvgraph

Gunrock

Totem-1G

Totem-2S

Totem-2S2G

Nvgraph

Gunrock

Totem-1G

Totem-2S

Totem-2S2G

PageRank SSSP BFS

Energy(w

att-sec)

Experimental Results 2. Energy Consumption – GPU Frameworks – Orkut Workload

1

10

100

1,000

Nvgraph

Gunrock

Totem-1G

Totem-2S

Totem-2S2G

Nvgraph

Gunrock

Totem-1G

Totem-2S

Totem-2S2G

Nvgraph

Gunrock

Totem-1G

Totem-2S

Totem-2S2G

PageRank SSSP BFS

Energy(w

att-sec)

Experimental Results 2. Energy Consumption – CPU Frameworks – Twitter Workload

1

10

100

1,000

10,000

100,000

Galois

GraphM

at

Totem-2S

Totem-2S2G

Galois

GraphM

at

Totem-2S

Totem-2S2G

Galois

GraphM

at

Totem-2S

Totem-2S2G

PageRank SSSP BFS

Energy(w

att-second

)

Experimental Results 2. Energy Consumption – CPU Frameworks – Twitter Workload

1

10

100

1,000

10,000

100,000

Galois

GraphM

at

Totem-2S

Totem-2S2G

Galois

GraphM

at

Totem-2S

Totem-2S2G

Galois

GraphM

at

Totem-2S

Totem-2S2G

PageRank SSSP BFS

Energy(w

att-second

)

EnergyEfficient:Totem-2S

Summary

• GPU+LinearAlgebra|CPU+Vertexprogramming=GoodMatch• GPUbasedframeworks:?• CPUbasedframeworks:Totem-2S•  TotemHybrid:Greenest• CSCPageRank• CSRBFS,SSSP

Discussion

Does hybrid have the future potential?

020004000600080001000012000140001600018000

02468

1012141618

BFS SSSP PR BFS SSSP PR

4S 2S2G

Energy(W

att-Sec)

ExecutionTime(secon

ds)ExecutionTime Energy

Totem-4SvsTotem-2S2GforRMAT30(edgelistsize:128GB)4SMachine:4xIntelXeonE7-4870v2(Ivybridge),with1,536GBmemory

27

Hybrid Graph Processing

Data-dependentmemory

accesspatterns

LargeCaches+summarydatastructures

Largememoryfootprint >1TB

CPUs

Poorlocality

Massivehardwaremultithreading

16GB!

GPUs

Lowcompute-to-memoryaccessratio

Caches+summarydatastructures

Varyingdegreesofparallelism(bothintra-andinter-stage)

GraphProcessing

LowDegreeHighDegree

Questions

code@:netsyslab.ece.ubc.ca