STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms

STING: Spatio-Temporal Interaction Networksand Graphs for Intel PlatformsDavid Bader, Jason Riedy, Henning Meyerhenke,David Ediger, Timothy Mattson

29 August 2011

Outline

Motivation

TechnicalOverall streaming approachClustering coefficientsConnected componentsCommunity detection (in progress)

RelatedPasqual, a scalable de novo sequence assembler

Plans

2 / 33

Exascale Data Analysis

Health care Finding outbreaks, population epidemiology

Social networks Advertising, searching, grouping

Intelligence Decisions at scale, regulating algorithms

Systems biology Understanding interactions, drug design

Power grid Disruptions, conservation

Simulation Discrete events, cracking meshes

3 / 33

Graphs are pervasive

• Sources of massive data: petascale simulations, experimentaldevices, the Internet, scientific applications.

• New challenges for analysis: data sizes, heterogeneity,uncertainty, data quality.

AstrophysicsProblem Outlier detectionChallenges Massive datasets, temporal variationGraph problems Matching,clustering

BioinformaticsProblem Identifying targetproteinsChallenges Dataheterogeneity, qualityGraph problems Centrality,clustering

Social InformaticsProblem Emergent behavior,information spreadChallenges New analysis,data uncertaintyGraph problems Clustering,flows, shortest paths

4 / 33

http://physics.nmt.edu/images/astro/hst_starfield.jpg

http://www.visualcomplexity.com

http://www.visualcomplexity.com

These are not easy graphs.Yifan Hu’s (AT&T) visualization of the Livejournal data set

5 / 33

http://www2.research.att.com/~yifanhu/GALLERY/GRAPHS/GIF_SMALL/[email protected]

Intel’s non-numeric computing program

Supporting massive, dynamic graph analysis across the spectrum ofIntel platforms.

Interactions• Workshop on Scalable Graph Libraries

• Co-sponsored by Georgia Tech & PNNL• Hosted at Georgia Tech, 29-30 June 2011• Attended by 33 from sponsors, industry, and academia.

(Timothy Mattson, Roger Golliver, Aydin Buluc, and JohnGilbert)

• Hosting Intel-loaned Westmere server• Quad E7-8870 with 0.25TiB of memory• Active access by Guy Blelloch’s benchmark group, John

Gilbert’s KDT group• Program-related access for PAPI counter support

• Graph analysis minisymposium at SIAM PP, Feb. 2012

6 / 33

10th DIMACS Implementation Challenge

Graph Partitioning and Graph Clustering

• Many application areas identify vertex subsets with manyinternal and few external edges. Problems addressed include:

• What are the communities within an (online) social network?• How do I speed up a numerical simulation by mapping it efficiently

onto a parallel computer?• How must components be organized on a computer chip such that

they can communicate efficiently with each other?• What are the segments of a digital image?• Which functions are certain genes (most likely) responsible for?

• 12-13 February 2012, Atlanta, Georgia• Paper deadline: 21 October 2011• Co-sponsored by DIMACS, by the Command, Control, and

Interoperability Center for Advanced Data Analysis (CCICADA);Pacific Northwest National Laboratory; Sandia National Laboratories;and Deutsche Forschungsgemeinschaft (DFG).

http://www.cc.gatech.edu/dimacs10/

7 / 33

http://www.cc.gatech.edu/dimacs10/

Overall streaming approach

Protein interactions, Giot et al., “A ProteinInteraction Map of Drosophila melanogaster”,Science 302, 1722-1736, 2003.

Jason’s network via LinkedIn Labs

Assumptions

• A graph represents some real-world phenomenon.• But not necessarily exactly!• Noise comes from lost updates, partial information, ...

8 / 33

http://inmaps.linkedinlabs.com/share/Jason_Riedy/135243263311536682812471775171414573322




Assumptions

• We target massive, “social network” graphs.• Small diameter, power-law degrees• Small changes in massive graphs often are unrelated.

8 / 33





Assumptions

• The graph changes but we don’t need a continuous view.• We can accumulate changes into batches...• But not so many that it impedes responsiveness.

8 / 33


Difficulties for performance

• What partitioningmethods apply?

• Geometric? Nope.• Balanced? Nope.• Is there a single, useful

decomposition?Not likely.

• Some partitions exist, butthey don’t often helpwith balanced bisection ormemory locality.

• Performance needs newapproaches, not juststandard scientificcomputing methods.


9 / 33


STING’s focus

Source data

predictionaction

summary

Control

VizSimulation / query

• STING manages queries against changing graph data.• Visualization and control often are application specific.

• Ideal: Maintain many persistent graph analysis kernels.• Keep one current snapshot of the graph resident.• Let kernels maintain smaller histories.• Also (a harder goal), coordinate the kernels’ cooperation.

10 / 33

STING and STINGER

Pre-process batch:Sort by source vertex,

reconcile ins/del.

Pre-change hook

Alter graph (may “age off”old edges)

Post-change hook

STINGERgraph

Batch of insertions / deletions

Affected vertices

Change in metric

• Batches provide two levels of parallelism.• Busy loci of change: Know to share the busy points.• Scattered changes: Parallel across (likely) independent changes.

• The massive graph is maintained in a data structure namedSTINGER.

11 / 33

STINGER

STING Extensible Representation:

• Rule #1: No explicit locking.• Rely on atomic operations.

• Massive graph: Scattered updates, scattered reads rarelyconflict.

• Use time stamps for some view of time.

12 / 33

Initial results

Prototype STING and STINGER

Monitoring the following properties:

1 clustering coefficients,

2 connected components, and

3 community structure (in progress).

High-level

• Support high rates of change, over 10k updates per second.

• Performance scales somewhat with available processing.

• Gut feeling: Scales as much with sockets as cores.

http://www.cc.gatech.edu/~bader/code.html

13 / 33

http://www.cc.gatech.edu/~bader/code.html

Experimental setup

Unless otherwise noted

Line Model Speed (GHz) Sockets Cores

Nehalem X5570 2.93 2 4Westmere E7-8870 2.40 4 10

• Westmere loaned by Intel (thank you!)

• All memory: 1067MHz DDR3, installed appropriately

• Implementations: OpenMP, gcc 4.6.1, Linux ≈ 3.0 kernel

• Artificial graph and edge stream generated byR-MAT[Chakrabarti, et al.].

• Scale x , edge factor f ⇒ 2x vertices, ≈ f · 2x edges.• Edge actions: 7/8th insertions, 1/8th deletions• Results over five batches of edge actions.

• Caveat: No vector instructions, low-level optimizations yet.

14 / 33

Clustering coefficients

• Used to measure“small-world-ness”[Watts and Strogatz]and potential community structure

• Larger clustering coefficient ⇒ moreinter-connected

• Roughly the ratio of the number of actualto potential triangles

v

i

j

m

n

• Defined in terms of triplets.

• i – v – j is a closed triplet (triangle).

• m – v – n is an open triplet.

• Clustering coefficient:# of closed triplets / total # of triplets

• Locally around v or globally for entire graph.

15 / 33

Updating triangle counts

Given Edge {u, v} to be inserted (+) or deleted (-)

Approach Search for vertices adjacent to both u and v , updatecounts on those and u and v

Three methods

Brute force Intersect neighbors of u and v by iterating over each,O(dudv ) time.

Sorted list Sort u’s neighbors. For each neighbor of v , check if inthe sorted list.

Compressed bits Summarize u’s neighbors in a bit array. Reducescheck for v ’s neighbors to O(1) time each.Approximate with Bloom filters. [MTAAP10]

All rely on atomic addition.

16 / 33

Batches of 10k actions

ThreadsGraph size: scale 22, edge factor 16

Upd

ates

per

sec

onds

, bot

h m

etric

and

ST

ING

ER

103.5

104

104.5

105

105.5

Brute force

●

●

●●

5.1e+03

1.7e+04 3.4e+04

3.9e+03

1.5e+04

0 20 40 60 80

Bloom filter

●●

●●●●●●

●●●

●

●●●

●

●●

2.4e+04

8.8e+04 1.2e+05

2.0e+04

1.3e+05

0 20 40 60 80

Sorted list

●

●●

●●●

●

●

●●●

●●

●●

●●●

●

●

●

●●●

● ●

●

●●

●●

2.2e+04

8.8e+04 1.4e+05

1.7e+04

1.2e+05

0 20 40 60 80

Machine

a 4 x E7−8870

a 2 x X5570

17 / 33

Different batch sizes


Upd

ates

per

sec

onds

, bot

h m

etric

and

ST

ING

ER

103.5

104

104.5

105

105.5

103.5

104

104.5

105

105.5

103.5

104

104.5

105

105.5

Brute force

●●●

●

●

●●

●

●

●●

●●

●

●

●

●

●●●●●●●●●

●

●

●●●

●

●

●●

0 20 40 60 80

Bloom filter

●

●

●

●

●

●●

●●●●

●

●

●●●

●●●●●

● ●

●

●●●●● ●●●

●●●

●●●●

●

●●

0 20 40 60 80

Sorted list

●●●

●

●

●●

●

●●

●

●●●●●

●●

●

●●●●●

●●●●●

●●

●●

●●●

●

●

●●●●● ● ●

●●●●

0 20 40 60 80

1001000

10000

Machine

4 x E7−8870

2 x X5570

18 / 33

Connected components

• Maintain a mapping from vertex tocomponent.

• Global property, unlike trianglecounts

• In “scale free” social networks:• Often one big component, and• many tiny ones.

• Edge changes often sit withincomponents.

• Remaining insertions mergecomponents.

• Deletions are more difficult...

19 / 33

Connected components: Deleted edges

The difficult case• Very few deletions

matter.

• Determining whichmatter may require alarge graph search.

• Re-running staticcomponentdetection.

• (Long history, seerelated work in[MTAAP11].)

• Coping mechanisms:• Heuristics.• Second level of

batching.

20 / 33

Deletion heuristics

Rule out effect-less deletions• Use the spanning tree by-product of static connected

component algorithms.

• Ignore deletions when one of the following occur:

1 The deleted edge is not in the spanning tree.2 If the endpoints share a common neighbor∗.3 If the loose endpoint can reach the root∗.

• In the last two (∗), also fix the spanning tree.

Rules out 99.7% of deletions.

21 / 33

Connected components: Performance


Upd

ates

per

sec

onds

, bot

h m

etric

and

ST

ING

ER

103

104

105

106

103

104

105

106

103

104

105

106

2.4e+03 1.6e+04 6.4e+033.2e+03 2.0e+03

1.7e+04 7.7e+04 1.4e+041.9e+04 2.0e+04

5.8e+04 1.3e+05 1.5e+055.5e+04 1.1e+05

12 4 6 8 12 16 24 32 40 48 56 64 72 80

1001000

10000

Machine

a 4 x E7−8870

a 2 x X5570

22 / 33

Community detection (work in progress)

Greedy, agglomerative partitioning

• Partition to maximize modularity, minimize conductance, ...

Seed set expansion

• Grow an optimal / ”relevant” community around selection.

• (Work with Jonny Dimond of KIT.)

23 / 33

Agglomerative community detection

Parallel greedy, agglomerative partitoning [PPAM11]

• Score edges by optimization criteria.

• Chose a maximal, heavy-weight matching.• Negate edge scores if minimizing conductance.

• Contract those edges.

• Mimics sequential optimizers, but produces different results.

24 / 33

Performance

• R-MAT on right.

• Livejournal• 15M vertex,

184M edge• 6-12 hours on

E7-8870

• Highly variableperformance.

• Algorithm underdevelopment.

Processors

Tim

e (h

ours

)

1.0

1.5

2.0

2.5

●

●

●

●●

22 23 24 25 26

Modularity

● 0.284

● 0.286

● 0.288

● 0.290

● 0.292

● 0.294

● 0.296

● 0.298

CPU

● Dual X5570 (2.93GHz)

Quad E7−8870 (2.40GHz)

25 / 33

Related: Pasqual

A scalable de novo assembler for next-gen gene sequencing

Work by David Bader, Henning Meyerhenke, Xing Liu, PushkarPande.

• Next-generation sequencers produce mountains of small genesequences.

• Assembling into a genome: Yet another large graph problem.

• Pasqual forms a compressed overlap graph and traces paths.

• Only scalable and correct shared-memory assembler.• Faster and uses less memory than other existing systems.• Evaluation against the few distributed assemblers is ongoing.

• Algorithm extends to metagenomics.

http://www.cc.gatech.edu/pasqual/

26 / 33

http://www.cc.gatech.edu/pasqual/

Human genome, 33.5Mbp (cov 30)

Similar speed, better results

Length Code Time (min) N50 (bp) Errors

35 Velvet >12h 1355 683Edena 451.15 1375 3ABySS 56.52 1412 521SOAPdenovo 15.62 1470 485Pasqual 15.50 1451 5

100 Velvet 132.68 6635 175Edena 466.12 6545 7ABySS 92.92 6229 136SOAPdenovo 18.05 6879 142Pasqual 19.15 7712 6

27 / 33

Zebrafish, 61Mbp (cov 30)

Far better speed and results

Length Code Time (min) N50 (bp) Errors

100 Velvet 256.02 4045 799Edena >12h 2661 0ABySS — — —SOAPdenovo 31.83 3998 725Pasqual 57.27 4535 0

200 Velvet — — —Edena — — —ABySS — — —SOAPdenovo — — —Pasqual 38.07 7911 0

28 / 33

Performance v. SOAPdenovo

Threads

Tim

e

102.5

103

103.5

104

●

●●

●

●

●●

●

●

●

●●

●

●

●●

●

●

21.70m

5.97m

40.32m

9.62m

97.38m

4.65m

166.67m

7.22m

12 4 8 16 32 40 64 80

Coverage

● 30

● 50

Code

● SoapDenovo

Pasqual

29 / 33

Plans

Community detection Improving the algorithm, pushing intostreaming by de-agglomerating and restarting.

Seed set expansion Maintaining not only one expanded set, butmultiple for high-throughput monitoring.

Microbenchmarks Expand on initial promising work oncharacterizing performance by peak number ofmemory operations achieved, find bottlenecks bycomparing with microbenchmarks.

Distributed/PGAS STINGER fits a PGAS model well (think SCC).Interested in exploring distributed algorithms.

Packaging Wrap STING into an easily downloaded and installedtool.

30 / 33

Bibliography I

D. Ediger, K. Jiang, E. J. Riedy, and D. A. Bader.Massive streaming data analytics: A case study with clusteringcoefficients.In Proceedings of the Workshop on Multithreaded Architecturesand Applications (MTAAP’10), Apr. 2010.

D. Ediger, E. J. Riedy, and D. A. Bader.Tracking structure of streaming social networks.In Proceedings of the Workshop on Multithreaded Architecturesand Applications (MTAAP’11), May 2011.

K. Madduri and D. A. Bader.Compact graph representations and parallel connectivityalgorithms for massive dynamic network analysis.In 23rd IEEE International Parallel and Distributed ProcessingSymposium (IPDPS), Rome, Italy, May 2009.

31 / 33

Bibliography II

E. J. Riedy, D. A. Bader, K. Jiang, P. Pande, and R. Sharma.Detecting communities from given seeds in social networks.Technical Report GT-CSE-11-01, Georgia Institute ofTechnology, Feb. 2011.

E. J. Riedy, H. Meyerhenke, D. Ediger, and D. A. Bader.Parallel community detection for massive graphs.In Proceedings of the 9th International Conference on ParallelProcessing and Applied Mathematics, Torun, Poland, Sept.2011.

32 / 33

References I

D. Chakrabarti, Y. Zhan, and C. Faloutsos.R-MAT: A recursive model for graph mining.In Proc. 4th SIAM Intl. Conf. on Data Mining (SDM), Orlando,FL, Apr. 2004. SIAM.

D. J. Watts and S. H. Strogatz.Collective dynamics of ‘small-world’ networks.Nature, 393(6684):440–442, Jun 1998.

33 / 33

STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms

Technology

Transcript of STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms