Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao Lei Chen Bin Cui School of EECS,...

21
Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao Lei Chen Bin Cui School of EECS, Peking University Hong Kong University of Science and Technology 1

Transcript of Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao Lei Chen Bin Cui School of EECS,...

Page 1: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

1

Efficient Cohesive Subgraph Detection

in ParallelYingxia Shao Lei Chen Bin Cui

School of EECS, Peking University Hong Kong University of Science and Technology

Page 2: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

2

Outline

• Background• Preliminaries• PETA: parallel and efficient truss detection algorithm

• Basic framework• Triangle complete subgraph• Subgraph-oriented model• Optimization techniques

• Evaluation• Conclusion

Page 3: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

3

Cohesive Subgraph

• Cohesive subgraph • identifies cohesive subgroups within

social networks. • helps social network analysts focus

on areas of the network that are likely to be fruitful.

• E.g., clique, -clique, -clan, -club, -plex, -core, etc.

Image source: Large Scale Cohesive Subgraph Discovery for Social Network Visual Analysis, VLDB’13

Background

Page 4: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

4

-Truss• -truss is a cohesive subgraph, where the

support of each edge is at least -2.• Support of an edge is the number of triangles

that contain the edge.

• The maximal -truss.

Problem Statement:Given a graph and a threshold , finding the maximal -truss in .

The subgraph with black thick edges is a 4-truss.

a

b d

c e

o

m

l

k

j

ih

f g

P1

P2 P3

Background

Page 5: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

5

Fundamental operation• The operation computes the support of an edge.

• Counting triangles around an edge .

• Two solutions• Classic solution [Cohen ’08, VLDB ’13]

1. Sorts the neighbors of each vertex in ascending order;2. Counts triangles in time complexity.

• Index-based solution [Wang ’12]1. When processing an edge , only enumerate neighbors of vertex which has

smaller degree;2. Test the existence of the third edge ( with the help of a HashTable.• Time complexity:

Preliminaries

Page 6: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

6

A straightforward detection framework

• The framework is introduced by J. Cohen.

1. Enumerate triangles2. For each edge, record the number of

triangles, containing that edge3. Keep only edges with the support greater

than 4. If step 3 dropped any edges, return to step 15. The remained graph is the maximal -truss

• Mapreduce solution [J. Cohen ’09]

• Two MapReduce jobs• One is for steps 1-2.• One is for step 3.

• Pregel-based solution [L. Quick ’12]

• Three supersteps. • Two are for steps 1-2• One is for step 3.

• Classic solution of the fundamental operation.Inefficiency

high communication cost large number of iterations

Preliminaries

Page 7: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

7

Contributions

• We propose a parallel and efficient truss detection algorithm.• We introduce a subgraph-oriented programming model to efficiently

implement the algorithm into popular graph computation systems.• We address the edge-support law in real-world graph.

Page 8: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

8

PETA: Parallel and Efficient Truss detection Algorithm• New detection framework behind PETA

1. Each worker constructs a special-designed subgraph;2. Simultaneously detects local -truss among workers;3. Communicates the update when it is unavoidable;4. Goto step 2 until all local -trusses are stable.

PETA Basic framework

a

b d

c e

o

m

nl

k

j

ih

f g

P1

P2 P3

a

b d

2

c e

3

mk

i

f

P2

l

d

e

mk

j

ih

f g

3

P3

ad

o

m

nl

k

i

P1

1

d

mk

i

P1

b d

2

c e2

2

mk

i

f

P2

d

e

mk

j

ih

f g

3

P3

New detection framework behind PETA

(a) original graph (b) special subgraphs (c) local -trusses

Page 9: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

9

Triangle Complete Subgraph

• Triangle Complete Subgraph (TC-subgraph)• For internal and cross edges,• TC-subgraph maintains all their triangles at

local.

• Property• TC-subgraph preserves sufficient knowledge.

• Theorem 1 and Theorem 2 prove the correctness of new framework in PETA with TC-subgraph.

a

b d

c e

mk

i

f

P2

Internal Edge

External Edge

Cross Edge

l

PETA TC Subgraph

Page 10: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

10

• The subgraph-oriented model allows to flexibly process the local subgraph by designing proper APIs.

• In PETA, we can use index-based approach to detect local k-truss.

Subgraph-Oriented Model

Vertex-centric Model Subgraph-oriented Model

Accessible data Vertex and one-hop neighbors Entire local subgraph

Access pattern Sequence Sequence/random

Local updates Require extra supersteps By-pass

User defined function Simple Fruitful

Expressivity Good Better

*Refer to the paper for API design.

PETA Subgraph Model

Page 11: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

11

Local subgraph algorithm for PETA

• The algorithm contains two phases.• Initialization phase.

• Constructs TC-subgraph via triangle counting routine• Require two supersteps

• Detection phase.• Apply index-based solution to compute the support of an edge.• First detection iteration, scan over internal and cross edges.• Successive detection iterations, modify local k-truss based on the

removal of external edges.

seamless detection!

PETA Local algorithm

Page 12: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

12

Efficiency analysis of PETA

• Computation Complexity• It is the same as the one of best-known serial algorithms, .

• Communication Complexity• Worst case is bounded by 3|Δ|.

• The number of iteration• It is minimal when a graph partition is given.

• Space complexity• Worst case is bounded by .

• Drawback• The worst space cost is achievable in theoretic, thus it may be infeasible for large

scale graphs.• e.g., clique

PETA Efficiency

Page 13: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

13

• Edge replicating factor ()• is the average number that an edge is

replicated.• Small leads to low space cost, computation

cost and communication cost.• Small implies few number of iterations.

Optimizations - I

a

b d

c e

mk

i

f

P2

Internal Edge

External Edge

Cross Edge

l

is edge cut ratio and stands for cross edge replication.

The third term is external edge replication, where is the support of an edge.

Optimizations

Page 14: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

14

Optimizations - II

• Edge-Support Law in real-world graph• The frequency distribution of the initial support of edges follows Power-Law.

Optimizations

Page 15: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

15

Optimizations - III

• Edge-balanced partition strategy• Improve the performance of the algorithm further.• Use METIS to generate a “good” partition.

• Since METIS is unable to balance the core edges directly, we assign each vertex’s degree as its weights, and balance the degree as an indicator for core edge balance.

Graph E[θ(e)] ρest ρrand ρmetis

livejournal 20.00 20.74 8.99 1.77

us-patent 2.36 3.25 3.13 1.19

wikitalk 5.93 7.53 4.52 3.31

dbpedia 7.61 9.11 6.77 2.10

Graphs are partitioned into 32 parts.

Optimizations

Page 16: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

16

Evaluation

• All experiments are conducted on a cluster with 23 physical nodes.

• Baselines• Cohen-MR [J. Cohen ’09]• Orig-LQ [L. Quick ’12]

Graph |V| |E|

wikitalk 2.4M 4.7M

us-patent 3.8M 16.5M

livejournal 4.8M 42.9M

dbpedia 17.2M 117.4M

Evaluation

Page 17: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

17

Efficiency

• On different datasets, PETA achieves 5x to 6x speedup compared with original pregel-based solution (i.e., orig-LQ and impr-LQ).

• The performance of Cohen-MR is at least 10X slower than the best one. So, it is not visualized for figures’ clarity.

Evaluation

livejournal us-patent

Page 18: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

18

Scalability

• The performance of PETA improves gracefully on both random and METIS-based partition schemes.

Evaluation

10-truss in dbpedia 40-truss in dbpedia

Page 19: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

19

Conclusions

• We designed an efficient parallel -truss detection algorithm, named PETA and thoroughly prove the advantages of PETA.

• The subgraph-oriented model has potential to improve the performance of other complex graph analysis tasks.

• In future, we will solve other truss-related problem under this framework.

Page 20: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

20

Q&A

Page 21: Efficient Cohesive Subgraph Detection in Parallel Yingxia Shao  Lei Chen  Bin Cui   School of EECS, Peking University  Hong Kong University of Science.

21

Backup Expr. – the number of iterations

K-truss Orig-LQ Cohen-MR PETA

Random METIS

5-truss 2212(503) 1006(503) 21 9

10-truss 272(68) 136(68) 23 14

40-truss 112(28) 56(28) 14 6

The number of iterations for k-truss detection on dbpedia