Graph Placement in a Data Center - Labouseur · draft 1.3 Labouseur, Svegliato, and Hwang 2 IoT is...
Transcript of Graph Placement in a Data Center - Labouseur · draft 1.3 Labouseur, Svegliato, and Hwang 2 IoT is...
Alan G. Labouseur School of Computer Science
and Mathematics Marist College
Poughkeepsie, NY 12601 [email protected]
Justin Svegliato School of Computer Science
and Mathematics Marist College
Poughkeepsie, NY 12601 [email protected]
Jeong-Hyon Hwang Dept. of Computer Science
University at Albany State University of New York
Albany, NY 12222 [email protected]
Distributed Graph Snapshot Placement and Query Performance
in a Data Center Environment
MARIST MARIST
draft 1.3 2Labouseur, Svegliato, and Hwang
IoTisalwaysGrowing• moresourcesandsensors• moreproductsandservices• moremessages• moretransactions
IoTisalwaysChanging• adding,removing,andmodifyingconnections
Inotherwords...IoTisalwaysEvolving
Q: Howdowegaininsightfromevolvingnetworks?
Daily Deluge of Data
3
Dynamic Graphs
Labouseur, Svegliato, and Hwang
A:Wegaininsightfromevolvingnetworksbytreatingthemlike DynamicGraphswherevertices(dots)modelentitiesand edges(lines)modelrelationshipsbetweenentities.
Theevolutionofanetworkcanbemodeledasaseriesofgraphsnapshotsthatrepresentthatnetworkatdifferentpointsintime.
G1
ac
bd
G2 ec
d
a
bf
ce
d
a
b
G3
snapshot snapshotsnapshot
!mepasses !mepasses
draft 1.3
4
G* Overview
Labouseur, Svegliato, and Hwang
G*isourdynamicgraphdatabasesystem.• graphsnapshotdistribution• multi-corescaleup• multi-serverscaleout• deduplicatedstorageforlargegraphs• in-memorycompactindexing• sharedcomputation• sophisticatedparallelgraphqueries• integrationwithRelationaldatabasesandotherstores
• aconvenienteasy-to-useGUI• adistributiondashboardforanalysis
draft 1.3
5
G* Details: Distributed Architecture
Labouseur, Svegliato, and Hwang
QueryParser
QueryOptimizer
QueryCoordinator
querynetwork
executionplan
G* master
query
Communication Layer
Graph ManagerMemory Buffer
Disk
Index
Query Execution Engine
HA
Communication Layer
network
......Graph Manager
Memory Buffer
Disk
Index
Query Execution Engine
HA
Communication Layer
α β
control
data controldata control
(Justin)
draft 1.3
6
G* Details: Deduplicated Snapshot Storage
Labouseur, Svegliato, and Hwang
c
bd
G2 G3ce
G1
df
ce
d
ca
b
a
b
a
bd
G1!G2!G3
ac
b
G1!G2!G3 G1-G2-G3
df
ce
(G2!G3)-G1 G3-G1-G2
!" #
d
(G1!G2)-G3
commonality-baseddeduplication
draft 1.3
7
G* Details: Distributed Query Processing
Labouseur, Svegliato, and Hwang
Average Degree Query (BSP) operator vertex@* = VertexOperator([], [α,β,γ]);operator degree@* = ProjectionOperator([vertex@local], [id, graph.id, cardinality(outgoing_edges)+cardinality(incoming_edges)], [id, graph.id, degree]);operator partial@* = AggregateOperator([degree@local], [count, sum], [*, degree], [count, sum], [graph.id]);operator union@α = UnionOperator([partial@*]);operator total@α = AggregateOperator([union@α], [sum, sum], [count, sum], [count, sum], [graph.id]);operator avg@α = ProjectionOperator([total@local], [graph.id, 1.0*sum/count], [graph.id, avg]);
draft 1.3
(1,1,{G1,G2})
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}), (a,2,{G1,G2}) (b,1,{G1,G2})
(c, !,{G1}), (d, !,{G1,G2}), (c, !,{G2}), (e, !,{G2})(a,!,{G1,G2}) (b, !,{G1,G2})
(1,2,{G1,G2})
(3/4,{G1}), (4/5,{G2})
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
(2,0,{G1}), (3,1,{G2}))
c
bd
{G1,G2,G3}
ac
b
{G1,G2,G3} {G1}
df
ce
{G2,G3} {G3}
d
{G1,G2}
G2G1
ac
e
bd
ac
bd
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}),
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
!" #
query: average degree variation
{G1,G2}
(1,1,{G1,G2})
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}), (a,2,{G1,G2}) (b,1,{G1,G2})
(c, !,{G1}), (d, !,{G1,G2}), (c, !,{G2}), (e, !,{G2})(a,!,{G1,G2}) (b, !,{G1,G2})
(1,2,{G1,G2})
(3/4,{G1}), (4/5,{G2})
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
(2,0,{G1}), (3,1,{G2}))
c
bd
{G1,G2,G3}
ac
b
{G1,G2,G3} {G1}
df
ce
{G2,G3} {G3}
d
{G1,G2}
G2G1
ac
e
bd
ac
bd
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}),
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
!" #
query: average degree variation
{G1,G2}
8
G* Details: Distributed Query Processing
Labouseur, Svegliato, and Hwang
Average Degree Query (BSP) operator vertex@* = VertexOperator([], [α,β,γ]);operator degree@* = ProjectionOperator([vertex@local], [id, graph.id, cardinality(outgoing_edges)+cardinality(incoming_edges)], [id, graph.id, degree]);operator partial@* = AggregateOperator([degree@local], [count, sum], [*, degree], [count, sum], [graph.id]);operator union@α = UnionOperator([partial@*]);operator total@α = AggregateOperator([union@α], [sum, sum], [count, sum], [count, sum], [graph.id]);operator avg@α = ProjectionOperator([total@local], [graph.id, 1.0*sum/count], [graph.id, avg]);
draft 1.3
(1,1,{G1,G2})
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}), (a,2,{G1,G2}) (b,1,{G1,G2})
(c, !,{G1}), (d, !,{G1,G2}), (c, !,{G2}), (e, !,{G2})(a,!,{G1,G2}) (b, !,{G1,G2})
(1,2,{G1,G2})
(3/4,{G1}), (4/5,{G2})
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
(2,0,{G1}), (3,1,{G2}))
c
bd
{G1,G2,G3}
ac
b
{G1,G2,G3} {G1}
df
ce
{G2,G3} {G3}
d
{G1,G2}
G2G1
ac
e
bd
ac
bd
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}),
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
!" #
query: average degree variation
{G1,G2}
9
G* Details: Distributed Query Processing
Labouseur, Svegliato, and Hwang
Average Degree Query (BSP) operator vertex@* = VertexOperator([], [α,β,γ]);operator degree@* = ProjectionOperator([vertex@local], [id, graph.id, cardinality(outgoing_edges)+cardinality(incoming_edges)], [id, graph.id, degree]);operator partial@* = AggregateOperator([degree@local], [count, sum], [*, degree], [count, sum], [graph.id]);operator union@α = UnionOperator([partial@*]);operator total@α = AggregateOperator([union@α], [sum, sum], [count, sum], [count, sum], [graph.id]);operator avg@α = ProjectionOperator([total@local], [graph.id, 1.0*sum/count], [graph.id, avg]);
draft 1.3
10
G* Details: Distributed Query Processing
Labouseur, Svegliato, and Hwang
(1,1,{G1,G2})
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}), (a,2,{G1,G2}) (b,1,{G1,G2})
(c, !,{G1}), (d, !,{G1,G2}), (c, !,{G2}), (e, !,{G2})(a,!,{G1,G2}) (b, !,{G1,G2})
(1,2,{G1,G2})
(3/4,{G1}), (4/5,{G2})
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
(2,0,{G1}), (3,1,{G2}))
c
bd
{G1,G2,G3}
ac
b
{G1,G2,G3} {G1}
df
ce
{G2,G3} {G3}
d
{G1,G2}
G2G1
ac
e
bd
ac
bd
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}),
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
!" #
query: average degree variation
Average Degree Query (BSP) operator vertex@* = VertexOperator([], [α,β,γ]);operator degree@* = ProjectionOperator([vertex@local], [id, graph.id, cardinality(outgoing_edges)+cardinality(incoming_edges)], [id, graph.id, degree]);operator partial@* = AggregateOperator([degree@local], [count, sum], [*, degree], [count, sum], [graph.id]);operator union@α = UnionOperator([partial@*]);operator total@α = AggregateOperator([union@α], [sum, sum], [count, sum], [count, sum], [graph.id]);operator avg@α = ProjectionOperator([total@local], [graph.id, 1.0*sum/count], [graph.id, avg]);
draft 1.3
11
G* Details: Distributed Query Processing
Labouseur, Svegliato, and Hwang
(1,1,{G1,G2})
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}), (a,2,{G1,G2}) (b,1,{G1,G2})
(c, !,{G1}), (d, !,{G1,G2}), (c, !,{G2}), (e, !,{G2})(a,!,{G1,G2}) (b, !,{G1,G2})
(1,2,{G1,G2})
(3/4,{G1}), (4/5,{G2})
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
(2,0,{G1}), (3,1,{G2}))
c
bd
{G1,G2,G3}
ac
b
{G1,G2,G3} {G1}
df
ce
{G2,G3} {G3}
d
{G1,G2}
G2G1
ac
e
bd
ac
bd
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}),
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
!" #
query: average degree variation
Average Degree Query (BSP) operator vertex@* = VertexOperator([], [α,β,γ]);operator degree@* = ProjectionOperator([vertex@local], [id, graph.id, cardinality(outgoing_edges)+cardinality(incoming_edges)], [id, graph.id, degree]);operator partial@* = AggregateOperator([degree@local], [count, sum], [*, degree], [count, sum], [graph.id]);operator union@α = UnionOperator([partial@*]);operator total@α = AggregateOperator([union@α], [sum, sum], [count, sum], [count, sum], [graph.id]);operator avg@α = ProjectionOperator([total@local], [graph.id, 1.0*sum/count], [graph.id, avg]);
draft 1.3
12
G* Details: Distributed Query Processing
Labouseur, Svegliato, and Hwang
(1,1,{G1,G2})
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}), (a,2,{G1,G2}) (b,1,{G1,G2})
(c, !,{G1}), (d, !,{G1,G2}), (c, !,{G2}), (e, !,{G2})(a,!,{G1,G2}) (b, !,{G1,G2})
(1,2,{G1,G2})
(3/4,{G1}), (4/5,{G2})
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
(2,0,{G1}), (3,1,{G2}))
c
bd
{G1,G2,G3}
ac
b
{G1,G2,G3} {G1}
df
ce
{G2,G3} {G3}
d
{G1,G2}
G2G1
ac
e
bd
ac
bd
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
vertex
degree
count, sum
average
union
vertex
degree
count, sum
vertex
degree
count, sum
(c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}),
(1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})
!" #
query: average degree variation
Average Degree Query (BSP) operator vertex@* = VertexOperator([], [α,β,γ]);operator degree@* = ProjectionOperator([vertex@local], [id, graph.id, cardinality(outgoing_edges)+cardinality(incoming_edges)], [id, graph.id, degree]);operator partial@* = AggregateOperator([degree@local], [count, sum], [*, degree], [count, sum], [graph.id]);operator union@α = UnionOperator([partial@*]);operator total@α = AggregateOperator([union@α], [sum, sum], [count, sum], [count, sum], [graph.id]);operator avg@α = ProjectionOperator([total@local], [graph.id, 1.0*sum/count], [graph.id, avg]);
draft 1.3
13
Dynamic Graph Analytics
Labouseur, Svegliato, and Hwang
Usingdistributedqueries,wecananalyzenetworkevolutiontodiscovertrendsandinsightscrucialtomanyLieldswithintheInternetofThings:
• networksofelectronichealthrecords• geolocationtrackers• diagnosticdatafromconnecteddevices(e.g.,smartwatches)• andmore…
G1
ac
bd
G2 ec
d
a
bf
ce
d
a
b
G3
!mepasses !mepasses
draft 1.3
14
Wait. Isn’t this already solved?
Labouseur, Svegliato, and Hwang
Sofar...• Acceleratingqueriesbydistributingstaticdataovermultipleworkershasbeenwellexplored.
However...• Acceleratingqueriesbydistributingcontinuouslygenerateddatapresentsnewchallenges.
• Techniquesforpartitioningindividualgraphstofacilitateparallelcomputationhavebeendeveloped.
• Techniquesforpartitioningseriesofgraphs(snapshots)tofacilitateparallelcomputationraisenewchallenges.‣ cannotconsideronlyonegraphatatime
‣ cannotdistributeeverysnapshottoallworkers
Sothere’saproblem.
draft 1.3
15
The Problem
Labouseur, Svegliato, and Hwang
Thesnapshotdistributionproblem:Howdowedistributesnapshotsofacontinuouslyevolvingnetworkamongouravailableworkersinawaythat’sef:icient,scalable,andoptimizedtoprocessvarioustypesofqueries?
Formally:
draft 1.3
16
Queries
Labouseur, Svegliato, and Hwang
PageRankrequiresmanyverticestobeexaminedtocomputethevalueforeachvertex.
AverageDegreerequiresonlyonevertextobeexaminedtocomputethevalueforeachvertex.
n. . . α β γ
. . . . . .
. . . . . .
n. . . α β γ
. . . . . .
. . . . . .
draft 1.3
17
Distributions
Labouseur, Svegliato, and Hwang
SharedNothingstoresalloftheverticescomprisingeachsnapshotononeofthenworkers.SnapshotG2isstoredentirelyonworkerβ.
n. . .
1-1
. . . . . .
. . . . . .
1-2
1-n
2-1
2-2
3-1
3-2
2-n 3-n
m-1
m-2
m-n
α β γ
n. . .
1-1
. . . . . .
. . . . . .
2-1
m-1
1-2
2-2
1-3
2-3
m-2 m-3
1-m
2-m
m-n
α β γ
SharedEverythingstorestheverticescomprisingeachsnapshotsuchthattheyaresharedamongallofthenworkers.SnapshotG2issharedacrossallworkers.
draft 1.3
18
Data Center Environment
Labouseur, Svegliato, and Hwang
IBMPureFlexhardware• three2.9GHZIntelXeonE5-2690CPUsof8coreseach(for24corestotal)
• 131GBRAM• dedicated20TBstorageareanetwork
Software• Ubuntu14.04.2LTS(GNU/Linux 3.13.0-57-generic x86 64)virtualmachinesoneachofourthreebladesconLiguredwithaccesstoall8coresanda70GBSANpartition
• OpenJDKRuntimeEnvironment(IcedTea2.5.5).• G*Databaseinstalledononemasterand23workers,eachtasksettoasinglecore
draft 1.3
Wecomputedtherelativespeedupfor• PageRankandAverageDegreequeries• ononesnapshotandallsnapshots(23evolutionarygraphs)• placedinSharedNothingandSharedEverythingdistributions• withvertexcountsof1K,2K,4K,and8K
19
Experiments
Labouseur, Svegliato, and Hwangdraft 1.3
20
Relative Speedup
Labouseur, Svegliato, and Hwang
RelativespeedupistheaveragequeryexecutiontimeforagivenqueryinSharedEverythingplacement(τse)dividedbytheaveragequeryexecutiontimeforSharedNothingplacement(τsn).
Speedup=τse÷τsn
BothPageRankandAverageDegreequerieswereexecutedLivetimesandtheresultingexecutiontimesaveraged.
draft 1.3
21
Results: PageRank on all snapshots
Labouseur, Svegliato, and Hwang
NotetheincreasingspeedupofSharedNothingoverSharedEverythingforPageRankqueriesonallsnapshots.
Speedupgrowsasdatasetgrows.
Shows• beneLitofparallelismvs.overheadofcommunication
• importanceofsnapshotplacement
Remember:PageRankqueriesinvolvesigniLicantcommunicationamongvertices.
Commoncase:AnalysisofchanginginLluenceasnetworksevolve.
draft 1.3
22
Results: Average Degree on all snapshots
Labouseur, Svegliato, and Hwang
NotetheconsistencyinSharedNothingandSharedEverythingforAverageDegreequeriesonallshapshots.
Speedupisunchangedasdatasetgrows.
Shows• beneLitsof“embarrassing”parallelism(nocommunicationoverhead)
• snapshotplacementdoesnotmatterRemember:AverageDegreequeriesinvolve
nocommunicationamongvertices.
Commoncase:Analysisofchangingpopularityasnetworksevolve.
draft 1.3
23
Results: PageRank on one snapshot
Labouseur, Svegliato, and Hwang
NotethedecreasingspeedupofSharedNothingoverSharedEverythingforPageRankqueriesononesnapshot.
Speedupshrinksasdatasetgrows.
Shows• asthedatasetgrows,thebeneLitofparallelismdecreasesduetoincreasingcommunicationoverhead
Remember:PageRankqueriesinvolvesigniLicantcommunicationamongvertices.
draft 1.3
24
Results: Average Degree on one snapshot
Labouseur, Svegliato, and Hwang
NotethedecreasingspeedupofSharedNothingoverSharedEverythingforAverageDegreequeriesononeshapshot.
Speedupshrinksasdatasetgrows.
Shows• impactofcommunicationoverheadonparallelism
• importanceofsnapshotplacement
Remember:AverageDegreequeriesinvolvenocommunicationamongvertices.
draft 1.3
25
Conclusions So Far
Labouseur, Svegliato, and Hwang
Eveninadatacenterenvironment(withfast,sharedresources),therearesigniMicantbeneMitsofcarefulsnapshotplacementinthemostcommonandimportantcases.• WhenmultiplesnapshotsarequeriedtogetherandthosequeriesinvolvesigniLicantcommunication(i.e.,PageRankonallsnapshots)itisadvantageoustostoreeachsnapshotonfewerservers.
• But…itisimportantthattheoverallquerieddatabebalancedoverallserverstoavoidhurtingtheperformanceofqueriesthatdonotinvolvesigniLicantcommunicationamongworkers(i.e.,AverageDegreeononesnapshot).
draft 1.3
26
Next Steps
Labouseur, Svegliato, and Hwang
Ourworkcontinues.We’re...• experimentingwithlargeranddifferentdatasets
‣ long-tailBarabasi-Albertgraphs‣ socialgraphsgeneratedbytheLinkedDataBenchmarkCouncilbenchmarkingtool
• increasingtheresolutionofourexecutiontimemeasurementstorecorddataattheBSPsupersteplevel.
• comparingourdatacenterresultswiththoseobtainedfromexperimentingonthe64-coreserverclusterweusedinourpriorwork‣ Thismaysuggestinsightintographqueryperformanceonclustersofconnectedcomputerswithnosharedresourcesascomparedtodatacenterenvironmentswithmanysharedresources.
draft 1.3
27
Thank you. Questions?
Alan G. Labouseur School of Computer Science
and Mathematics Marist College
Poughkeepsie, NY 12601 [email protected]
Justin Svegliato School of Computer Science
and Mathematics Marist College
Poughkeepsie, NY 12601 [email protected]
Jeong-Hyon Hwang Dept. of Computer Science
University at Albany State University of New York
Albany, NY 12222 [email protected]
Distributed Graph Snapshot Placement and Query Performance
in a Data Center Environment
draft 1.3