Query Languages for Unrestricted Graph Data · • Cypher (neo4j) – declarative, highly similar...

72
Query Languages for Unrestricted Graph Data Alin Deutsch UC San Diego

Transcript of Query Languages for Unrestricted Graph Data · • Cypher (neo4j) – declarative, highly similar...

QueryLanguagesforUnrestrictedGraphData

AlinDeutschUCSanDiego

TheAgeoftheGraphIsUponUs(Again)

•  Early-mid-90s:semi-orun-structureddataresearchwasalltherage–  datalogicallyviewedasgraph–  initiallymotivatedbymodelingWWW(page=vertex,link=edge)–  querylanguagesexpressingconstrainedreachabilityingraph

•  Late90s:specialcaseXML(graphrestrictedtotreeshape)

•  2000s:JSONandfriends(alsotreeshaped)

•  ~2010topresent:backtounrestrictedgraphs–  Initiallymotivatedbyanalytictasksinsocialnetworks,–  Nowuniversaluse(dataislinkedinallscenarios)

TheUnrestrictedGraphDataModel•  Nodescorrespondtoentities

•  Edgesarebinary,correspondtorelationships

•  Edgesmaybedirectedorundirected

•  Nodesandedgesmaycarrylabels

•  Nodesandedgesannotatedwithdata–  bothhavesetsofattributes(key-valuepairs)

•  Aschemaisnotrequiredtoformulatequeries

ExampleGraph

Vertextypes:•  Product(name,category,price)•  Customer(ssn,name,address)Edgetypes:•  Bought(discount,quantity)•  Customercbought100unitsofproductpatdiscount5%:modeledbyedgec--(Bought{discount=5%,quantity=100})àp

ExpressingGraphAnalytics

•  TwoDifferentApproaches–  High-levelquerylanguagesàlaSQL– Low-levelprogrammingabstractions

•  ProgramswritteninC++,Java,Scala,Groovy…

•  Initiallyadoptedbydisjointcommunities(recallNoSQLdebates)

•  Recenttrendtowardsunification

High-LevelQueryLanguages

SomeModernGraphQLsWeWillDiscuss

Thereisahostofthem!Spectrumincludes•  Datalogwithaggregation(LogicBlox)

•  Cypher(neo4j)–  declarative,highlysimilartoStruQL(andhenceCRPQs)

•  Gremlin(Apacheandcommercialprojects)–  dataflowprogrammingmodel:graphannotatedwithtokens(“traversers”)thatflowthroughitaccordingtouserprogram

•  Newarrival:GSQL(TigerGraph)–  InspiredbySQL+BSP,extendedformoreflexiblegrouping/aggregation

KeyIngredientsforHigh-LevelQueryLanguages

•  PioneeredbyacademicworkonConjunctiveQuery(CQ)extensionsforgraphs(since‘87)–  Pathexpressions(PEs)fornavigation–  Variablesformanipulatingdatafoundduringnavigation–  StitchingmultiplePEsintocomplexnavigationpatternsàconjunctiveregularpathqueries(CRPQs)

•  BeyondCRPQs,neededinmodernapplications:–  Aggregationofdataencounteredduringnavigationèsupportforbagsemanticsasprerequisite–  Intermediateresultsassignedtonodes/edges–  Controlflowsupportforclassofiterativealgorithmsthatconvergetoresultinmultiplesteps

•  (e.g.PageRank-class,recommendersystems,shortestpaths,etc.)

PathExpressions

PathExpressions

•  Expressreachabilityviaconstrainedpaths

•  Earlygraph-specificextensionoverconjunctivequeries

•  Introducedinitiallyinacademicprototypesinearly90s–  StruQL(AT&TResearch,Fernandez,Halevy,Suciu)– WebSQL(Mendelzon,Mihaila,Milo)–  Lorel(Widometal)

•  Todaysupportedbylanguagesofcommercialsystems–  Cypher,SparQL,Gremlin,GSQL

PathExpressionSyntax

Notationsvary.AdoptingherethatofSparQLW3CRecommendation.pathàedgelabel

| _ //wildcard,anyedgelabel | ^edgelabel // inverseedge | path.path //concatenation | path|path //alternation | path* //0ormorereps | path*(min,max) //atleastmin,atmostmax | (path)

PathExpressionExamples(1)

•  Pairsofcustomerandproducttheybought:

Bought

•  Pairsofcustomerandproducttheywereinvolvedwith(boughtorreviewed)

Bought|Reviewed

•  Pairsofcustomerswhoboughtsameproduct(listscustomerswiththemselves)

Bought.^Bought

PathExpressionExamples(2)

•  Pairsofcustomersinvolvedwithsameproduct(like-minded)

(Bought|Reviewed).(^Bought|^Reviewed)

•  Pairsofcustomersconnectedviaachainoflike-mindedcustomerpairs

((Bought|Reviewed).(^Bought|^Reviewed))*

PathExpressionSemantics

•  Inmostacademicresearch,thesemanticsaredefinedintermsofsetsofnodepairs

•  Traditionallyspecifiedintwoways:– Declaratively,basedonsatisfactionofformulae/patterns

–  Procedurally,basedonalgebraicoperationsoverrelations

•  Theseareequivalent

ClassicalDeclarativeSemantics

•  Given:–  graphG–  pathexpressionPE

•  themeaningofPEonG,denotedPE(G)is

thesetofnodepairs(src,tgt)s.t.thereexistsapathinGfromsrctotgtwhoseconcatenatedlabelsspelloutawordinL(PE)

L(PE)=languageacceptedbyPEwhenseenasregularexpressionoveralphabetofedgelabels

ClassicalProceduralSemantics

PE(G)isabinaryrelationovernodes,definedinductivelyas:

•  E(G)=setofs-tnodepairsofEedgesinG

•  _(G)=setofs-tnodepairsofanyedgesinG

•  ^E(G)=setoft-snodepairsofEedgesinG

•  P1.P2(G)=P1(G)oP2(G)

•  P1|P2(G)=setunion(P1(G),P2(G))

•  P*(G)=reflexivetransitiveclosureofP(G)

relationalcomposition

finiteduetosaturation

ConjunctiveRegularPathQueries

•  ReplacerelationalatomsappearinginCQswithpathexpressions.

•  Explicitlyintroducevariablesbindingtosourceandtargetnodesofpathexpressions.

•  Allowmultiplepathexpressionatomsinquerybody.

•  Variablescanbeusedtostitchmultiplepathexpressionatomsintocomplexpatterns.

CRPQExamples

•  Pairsofcustomerswhohaveboughtsameproduct(donotlistacustomerwithherself):

Q1(c1,c2):-c1–Bought.^Bought->c2,c1!=c2

•  Customerswhohaveboughtandalsoreviewedaproduct:

Q2(c):-c–Bought->p,c–Reviewed->p

CRPQSemantics

•  Naturallyextendedfromsinglepathexpressions,followingmodelofCQs

•  Declarative–  liftingthenotionofsatisfactionofapathexpressionatombyasource-targetnodepairtothenotionofsatisfactionofaconjunctionofatomsbyatuple

•  Procedural

–  basedonSPRJmanipulationofthebinaryrelationsyieldedbytheindividualpathexpressionatoms

LimitationofSetSemantics

•  Commongraphanalyticsneedtoaggregatedata–  e.g.countthenumberofproductstwocustomershaveincommon

•  Setsemanticsdoesnotsuffice–  baked-induplicateeliminationaffectstheaggregation

•  AsinSQL,practicalsystemsresorttobagsemantics

PathExpressionsUnderBagSemantics

PE(G)isabagofnodepairs,definedinductivelyas:

•  E(G)=setbagofs-tnodepairsofEedgesinG

•  _(G)=setbagofs-tnodepairsofanyedgesinG

•  ^E(G)=setbagoft-snodepairsofEedgesinG

•  P1.P2(G)=P1(G)oP2(G)

•  P1|P2(G)=setbagunion(P1(G),P2(G))

•  P*(G)=reflexivetransitiveclosureofP(G)

relationalcompositionfor

bags

Notnecessarilyfiniteunderbag

semantics!

IssueswithBagSemantics

•  Performanceandsemanticissuesduetonumberofdistinctpaths

•  Multiplicityofs-tpairinqueryoutputreflectsnumberofdistinctpathsconnectingswitht

–  EveninDAGs,thesecanbeexponentiallymany.Chainofdiamondsexample:

– Moreserious:incyclicgraphs,canbeinfinitelymany

SolutionsInPractice:BoundTraversalLength

•  Upper-boundthelengthofthetraversedpath–  RecallboundedKleeneconstruct*(min,max)

–  Boundslengthandhencenumberofdistinctpathsconsidered

–  SupportedbyGremlin,Cypher,SparQL,GSQL,verycommonintutorialexamplesandinindustrialpractice

SolutionsInPractice:RestrictCycleTraversal

•  Norepeatingvertices(simplepaths)–  Rulesoutpathsthatgoaroundcycles–  RecommendedinGremlinstyleguides,tutorials,formalsemanticspaper

–  Gremlin’ssimplePath()predicatesupportsthissemantics–  Problem:membershipofs-tpairinresultisNP-hard

•  Norepeatingedges–  Allowscyclicpaths–  Rulesoutpathsthatgoaroundsamecyclemorethanonce–  ThisistheCyphersemantics

SolutionsInPractice:MixBagandSetSemantics

•  Bagsemanticsforstar-freefragmentsofPE•  SetsemanticsforKleene-starredfragmentsofPE•  Combinethemusing(bag-aware)joins

•  Example:p1.p2*.p3(G)treatedasp1(G)o(distinct(p2*(G)))op3(G)•  ThisistheSparQLsemantics(inW3CRecommendation)

SolutionsInPractice:LeaveittoUser

•  Userexplicitlyprogramsdesiredsemantics

•  Pathisfirst-classcitizen,canbementionedinquery

•  Cansimulateeachoftheabovesemantics,e.g.bycheckingthepathforrepeatednodes/edges

•  Couldleadtoinfinitetraversalsforbuggyprograms

•  SupportedbyGremlin,GSQL–  alsopartiallybyCypher(modulorestrictionthatonlyedgenon-repeatingpathsarevisible)

OneSemanticsIWouldPrefer

•  Allowpathstogoaroundcycles,evenmultipletimes

•  Achievefinitenessbyrestrictiontopumping-minimalpaths–  inthesenseofPumpingLemmaforFiniteStateAutomata(FSA)–  PEareregularexpressions,theyhaveanequivalentFSArepresentation(uniqueuptominimization)

–  Aspathistraversed,FSAstatechangesateverystep–  RuleoutpathsinwhichavertexisrepeatedlyreachedinthesameFSAstate

•  CanbeprogrammedbyuserinGremlinandGSQL(costly!)

ATractableSemantics:ShortestPaths

•  Forpatternx–Pattern->y,

vertexpair(s,t)isanansweriffthereisapathpfromstots.t.– wordspelledbyedgelabelsofpisinL(Pattern)–  pisshortestamongallsuchpathsfromstot

•  Multiplicityof(s,t)inansweristhecountofsuchshortestpaths

ContrastingSemantics

•  patternE*overgraph:

st

•  s-tisananswerunderallsemantics,but–  Simple-path:s-thasmultiplicity2– Unique-edge:s-thasmultiplicity3–  Shortest-path:s-thasmultiplicity1

E E E

E E E EEE

E EEE

Aggregation

Let’sSeeitFirstasCQExtension

•  CounttoysboughtincommonpercustomerpairQ(c1,c2,count(p)):-c1–Bought->p,c2–Bought->p,p.category=“toys”,c1<c2•  c1,c2:compositegroupkey-noexplicitgroup-byclause

•  Standardsyntaxforaggregation-extendedCQsandDatalog

•  Richliteratureonsemantics–  (trickyforDatalogwhenaggregationandrecursioninterleave).

AggregationinModernGraphQLs

•  Cypher’sRETURNclauseusessimilarsyntaxasaggregation-extendedCQs

•  GremlinandSparQLuseanSQL-styleGROUPBYclause

•  GSQLusesaggregatingcontainerscalled“accumulators”

FlavorofRepresentativeLanguages

RunningExampleinCRPQForm

•  Recall:counttoysboughtincommonpercustomerpairQ(c1,c2,count(p)):-c1–Bought->p,c2–Bought->p,p.category=“toys”,c1<c2

SparQL

•  Querylanguageforthesemanticweb– graphscorrespondingtoRDFdataaredirected,labeledgraphs

•  W3CStandardRecommendation

RunningExampleinSparQL

SELECT?c1,?c2,count(?p)WHERE{?c1bought?p.?c2bought?p. ?pcategory?cat.

FILTER(?cat==“toys”&&?c1<?c2)}

GROUPBY?c1,?c2

SparQLSemanticsbyExample

•  CoincideswithCRPQversion

Q(c1,c2,count(p)):-c1–Bought->p,c2–Bought->p,p.category=“toys”,c1<c2

Cypher

•  Thequerylanguageoftheneo4jcommercialnativegraphdbsystem

•  EssentiallyStruQLwithsomebellsandwhistles

•  Alsosupportedinavarietyofothersystems:–  SAPHANAGraph,AgensGraph,RedisGraph,Memgraph,CAPS(CypherforApacheSpark),ingraph,Gradoop,Ruruki,Graphflow

RunningExampleinCypher

MATCH(c1:Customer)–[:Bought]->(p:Product)<-[:Bought]-(c2:Customer)WHEREp.category=“Toys”ANDc1.name<c2.nameRETURNc1.nameAScust1,

c2.nameAScust2, COUNT(p)ASinCommon

c1.name,c2.namearecompositegroupkey–noexplicitgroup-byclause,justlikeCQ

CypherSemanticsbyExample

•  CoincideswithCRPQversion

Q(c1,c2,count(p)):-c1–Bought->p,c2–Bought->p,c1<c2

•  Modulonon-repeatingedgerestriction–  noeffectheresincerepeated-edgepathssatisfyingthetwoPEatomswouldnecessarilyhavec1=c2

Gremlin

•  SupportedbymajorApacheprojects

– TinkerPopandJanusGraph

•  Alsobycommercialsystemsincluding– TitanGraph(DataStax)– Neptune(Amazon),– Azure(Microsoft),–  IBMGraph

GremlinSemantics

•  Basedontraversers,i.e.tokensthatflowthroughgraphbindingvariablesalongtheway

•  AGremlinprogramadornsthegraphwithasetoftraversersthatco-existsimultaneously

•  Aprogramisapipelineofsteps,eachstepworksonthesetoftraverserswhosestatecorrespondstothisstep

•  Stepscanbe–  mapsteps(workinparallelonindividualtraversers)–  reducesteps(aggregatesetoftraversersintoasingletraverser)

GremlinSemanticsbyExample

V()

placeonetraverseroneachvertex

GremlinSemanticsbyExample

V().hasLabel(‘Customer’)

filtertraversersbylabel

GremlinSemanticsbyExample

V().hasLabel(‘Customer’).as(‘c1’)

extendeachtraversert:bindvariable‘c1’tothevertexwheretresides

GremlinSemanticsbyExample

V().hasLabel(‘Customer’).as(‘c1’).out(‘Bought’)

Traversersflowalongout-edgesoftype‘Bought’.

IfmultiplesuchedgesemanatefromaCustomervertexv,thetraverseratvsplitsintoonecopyperedge,

placedatedgedestination.

GremlinSemanticsbyExample

V().hasLabel(‘Customer’).as(‘c1’).out(‘Bought’).hasLabel(‘Product’).has(‘category’,’Toys’)

filtertraversersatdestinationof‘Bought’edges:vertexlabelmustbe‘Product’andtheymusthavea

propertynamed‘category’ofvalue‘Toys’

GremlinSemanticsbyExample

V().hasLabel(‘Customer’).as(‘c1’).out(‘Bought’).hasLabel(‘Product’).has(‘category’,’Toys’).as(‘p’)

extendsurvivingtraverserswithbindingofvariable‘p’totheirlocationvertex.

noweachsurvivingtraverserhastwovariablebindings:c1,p

GremlinSemanticsbyExample

V().hasLabel(‘Customer’).as(‘c1’).out(‘Bought’).hasLabel(‘Product’).has(‘category’,’Toys’).as(‘p’).in(‘Bought’)

Survivingtraverserscrossincomingedgesoftype‘Bought’.Multiplein-edgesresultinfurthersplits.

GremlinSemanticsbyExample

V().hasLabel(‘Customer’).as(‘c1’).out(‘Bought’).hasLabel(‘Product’).has(‘category’,’Toys’).as(‘p’).in(‘Bought’).hasLabel(‘Customer’).as(‘c2’).select(‘c1’,‘c2’,‘p’).by(‘name’)

foreachtraverserextractthetupleofbindingsforvariablesc1,c2,p,returnitsprojectionon‘name’property.

GremlinSemanticsbyExample

V().hasLabel(‘Customer’).as(‘c1’).out(‘Bought’).hasLabel(‘Product’).has(‘category’,’Toys’).as(‘p’).in(‘Bought’).hasLabel(‘Customer’).as(‘c2’).select(‘c1’,‘c2’,‘p’).by(‘name’).where(‘c1’,lt(‘c2’))

filterthesetuplesaccordingtowherecondition

GremlinSemanticsbyExample

V().hasLabel(‘Customer’).as(‘c1’).out(‘Bought’).hasLabel(‘Product’).has(‘category’,’Toys’).as(‘p’).in(‘Bought’).hasLabel(‘Customer’).as(‘c2’).select(‘c1’,‘c2’,‘p’).by(‘name’).where(‘c1’,lt(‘c2’)).group().by(select(‘c1’,’c2’)).by(count())

grouptuples

firstby()specifiesgroupkeysecondby()specifiesgroup

aggregation

GSQL

•  ThequerylanguageofTigerGraph,anativeparallelgraphdbsystem

•  Arecentstart-upfoundedbyUCSDDBlab’sPhDalumYuXu

•  Fulldisclosure:Ihavebeeninvolvedindesign

GSQLAccumulators

•  GSQLtraversalscollectandaggregatedatabywritingitintoaccumulators

•  Accumulatorsarecontainers(datatypes)that–  holdadatavalue–  acceptinputs–  aggregateinputsintothedatavalueusingabinaryoperation

•  Maybebuilt-in(sum,max,min,etc.)oruser-defined

•  Maybe–  global(asinglecontaineraccessiblefromalltraversalsteps)–  local(onepernode,accessibleonlywhenreachedbytraversal)

RunningExampleinGSQLGroupByAccum<stringcust1,stringcust2,SumAccum<int>inCommon>@@res;SELECT_FROMCustomer:c1-(Bought>)-Product:p–(<Bought)-Customer:c2WHEREp.category==“Toys”ANDc1.name<c2.nameACCUM@@res+=(c1.name,c2.name->1);

Globalaccum

cust1,cust2formcompositegroupkey inCommonisgroupvalue

(asumaggregation)

createinputassociatingvalue1tokey(c1.name,c2.name)

aggregatethisinputintoaccumulator

GSQLSemanticsbyExampleGroupByAccum<stringcust1,stringcust2,SumAccum<int>inCommon>@@res;SELECT_FROMCustomer:c1-(Bought>)-Product:p–(<Bought)-Customer:c2WHEREp.category==“Toys”ANDc1.name<c2.nameACCUM@@res+=(c1.name,c2.name->1);

…executeACCUMclause

ForeverydistinctpathsatisfyingFROMpatternandWHEREcondition…

WhyAggregateinAccumulatorsInsteadofSelect-GroupByClauses?

GroupByAccum<stringcust,SumAccum<float>total>@@cSales;GroupByAccum<stringprod,SumAccum<float>total>@@pSales;SELECT_FROMCustomer:c-(Bought>:b)-Product:pACCUMfloatthisSalesRevenue=b.quantity*(1-b.discount)*p.price,

@@cSales+=(c.name->thisSalesRevenue),@@pSales+=(p.name->thisSalesRevenue);

revenuepercustomer

revenueperproduct

multipleaggregationsinonepass,evenondifferentgroupkeys

localvariable,thisisaletclause

LocalAccumulators

•  Minimizebottlenecksduetosharedglobalaccums,maximizeopportunitiesforparallelevaluation

SumAccum<float>@cSales,@pSales;SELECT_FROMCustomer:c-(Bought>:b)-Product:pACCUMfloatthisSalesRevenue=b.quantity*(1-b.discount)*p.price,c.@cSales+=thisSalesRevenue,p.@pSales+=thisSalesRevenue;

localaccums,oneinstancepernode

groupsaredistributed,eachnodeaccumulatesitsowngroup

RoleofSELECTClause?Compositionality

•  queriescanoutputsetofnodes,storedinvariables•  usedbysubsequentqueriesastraversalstartingpoint:

S1=SELECTtFROMS0:s–pattern1–T1:tWHERE… ACCUM…S2=SELECTtFROMS1:s–pattern2–T2:t…WHERE… ACCUM…S3=SELECTtFROMS1:s–pattern3–T3:t…WHERE… ACCUM…

VariableS1storessetofnodesreachedintraversal

NodesetvariableusedinsubsequenttraversalS1usedinsubsequenttraversals(querychaining)

RecommendedToysRankedbyLog-CosineSimilarity

SumAccum<float> @rank, @lc; SumAccum<int> @inCommon; I = {Customer.1}; ToysILike, OthersWhoLikeThem=

SELECT p, oFROM I:c-(Likes>)-Product:p -(<Likes)-Customer:oWHERE p.category== “Toys” and o != cACCUM o.@inCommon +=1POST-ACCUM o.@lc = log(1 + o.@inCommon);

ToysTheyLike = SELECT t FROM OthersWhoLikeThem:o –(Likes>)-Product:t WHERE t.category == "toy" ACCUM t.@rank+= o.@lc;

RecommendedToys= ToysTheyLike – ToysILike;

ControlFlowPrimitives

LoopsAreEssential

•  Loops(untilconditionissatisfied)–  ExplicitlysupportedinGremlinandGSQL

– NecessarytoprogramiterativealgorithmslikePageRank,recommendersystems,shortest-path,etc.

–  CanbeusedtoprogrammatchofKleene-starredpathexpressionsundervarioussemantics

•  If-then-else,caseconstructs–  SupportedbyallQLsinsomeway

PageRankinGSQL

CREATEQUERYpageRank(floatmaxChange,intmaxIteration,floatdampingFactor){MaxAccum<float>@@maxDifference=9999;//maxscorechangeinaniterationSumAccum<float>@received_score=0;//sumofscoresreceivedfromneighborsSumAccum<float>@score=1;//initialscoreforeveryvertexis1.AllV={Page.*};//startwithallverticesoftypePageWHILE@@maxDifference>maxChangeLIMITmaxIterationDO@@maxDifference=0;S=SELECTsFROMAllV:s-(Linkto)->:tACCUMt.@received_score+=s.@score/s.outdegree()POST-ACCUMs.@score=1-dampingFactor+dampingFactor*s.@received_score,s.@received_score=0,@@maxDifference+=abs(s.@score-s.@score');END;}

Low-level,NoSQL-styleProgrammingfor

ParallelGraphAnalytics

Think-Like-a-Vertex(TLAV)akaVertex-Centric

•  Parallelcomputingabstraction

•  Conceptually,eachvertexisaprocessor

•  Verticesexecuteavertexprograminparallel

•  Instancesofvertexprogramscommunicateviamessagestoneighbors

•  Verticestypicallyexecuteinlockstep(viasynchronizationbarriers)

Pregel:ATLAVProgrammingAbstraction

•  Bulk-synchronousparallelcomputingabstraction

•  IntroducedbyPregelSystem(Google)•  Supportedinopen-sourcesystems

–  e.g.Giraph(Apache),GraphX(ApacheSpark)

•  Pregelprogramexecutesinlockstepaseriesofsupersteps

•  Duringeachsuperstep,vertices(inparallel)–  receiveinboundmessagessentinprevioussuperstep,–  computeanewvalueforthevertexdata–  sendmessagestoneighboringvertices(receivedinnextsuperstep)

Gather-Apply-Scatter(GAS)

•  IsomorphicwithPregelwhenverticesevaluateinlockstep•  Alsosupportsasynchronousevaluation

•  IntroducedbyGraphLabsystem(anopen-sourceproject)

•  Eachvertexprogramstepisorganizedinthreephases:–  Gather:maydirectlyaccessinformationfromitsone-hopneighborhood,aggregatingitwithuser-definedfunction

–  Apply:vertexvalueisupdatedbyincorporatingthissum–  Scatter:neighborhoodvaluesupdatedusingresultofapplyphase

•  Communicationabstraction:sharedmemory,notmessaging

PowerGraph

•  RefinementofGASabstractiontoprocessedgesinparallel–  forloadbalancinginpresenceofhigh-degreevertices

•  Gatherphaseexecutesafunctionthatmapsoveredges

•  Resultsofedgemaparereducedbyauser-definedSumfunction

•  Applyphaseusesthereducedresult

•  Onlyedgesincidentonactiveverticeswork.–  Verticescanbeexplicitlyactivatedduringscatterphase.

GSQL’sEdge-Map/Vertex-Reduce(EM/VR)

•  ExtendsPowerGraphforflexibility– usercandefinemultipleindependentreducersviaaccumulators

– accumscanbelocalorglobal– accumsarefirst-classcitizens

•  persistacrosssteps,canbementionedbyfuturesteps

– parallelmapoveredgesgeneratesaccuminputs–  reducephaseupdateseachaccumvaluebyaggregatingallinputsintoit

GSQLAsHigh-levelEM/VRProgram

SumAccum<float>@cSales,@pSales;SELECTpFROMCustomer:c-(Bought>:b)-Product:pACCUMfloatthisSalesRevenue=b.quantity*(1-b.discount)*p.price,c.@cSales+=thisSalesRevenue,p.@pSales+=thisSalesRevenue;

localaccumsimplementtwoindependentreducers

FROMclausefiltersedgesundergoingmap

ACCUMclauseexecutesperedge,generatesaccuminputs

SELECTclausespecifiesvertextoactivatenext

Summary

•  Wediscussedrepresentativehigh-levelgraphQLs–  frompointofviewofexpressivepowerandsemantics–  de-emphasizingsyntax

•  WehaveseenNoSQL-stylelow-levelparallelgraphprogrammingabstractions

•  Noneedtochoosebetweenhigh-levelandlow-level(falsechoiceclaimedbypriorNoSQL-relateddebates)–  abstractionlevelscanbeharmonized(asshownforGSQL)

TopicsNotCoveredHere

•  Creating/modifyingverticesandedges– Asopposedtojustreturningtablesofvariablebindings

•  Non-scalarvertexandedgeproperties(thesecanbelists/arraysandothercontainers)

•  Behaviorwhenavertex/edgepropertydoesnotexist(optionsarecomprehensivelylaidoutinPartAonhierarchicalgraphmodel)

•  Graphschemas