The shortest path is not always a straight line

63
THE SHORTEST PATH IS NOT ALWAYS A STRAIGHT LINE leveraging semi-metricity in large-scale graph analysis Vasiliki Kalavri ([email protected]) KTH Royal Institute of Technology Tiago Simas ([email protected]) Telefonica Research Dionysios Logothetis ([email protected]) Facebook

Transcript of The shortest path is not always a straight line

Page 1: The shortest path is not always a straight line

THE SHORTEST PATH IS NOT ALWAYS A STRAIGHT LINE

leveraging semi-metricity in large-scale graph analysis

Vasiliki Kalavri ([email protected]) KTH Royal Institute of TechnologyTiago Simas ([email protected]) Telefonica Research Dionysios Logothetis ([email protected]) Facebook

Page 2: The shortest path is not always a straight line

2

Alice42 likes

Weighted graphs capture relationship strength

distance

similarity social proximity

rating preference

influential nodes

optimal propagation paths

communities

recommendations

BobMax

3 likes

Page 3: The shortest path is not always a straight line

3

Sparsification techniques reduce the graph size and still give exact or good

approximate results

G G’f(G) ~ f(G’)

Page 4: The shortest path is not always a straight line

THE METRIC BACKBONE

Reduces the graph size while maintaining relevant structure

The minimum subgraph of a weighted graph, that preserves the shortest paths of the original graph

4

B

E

DA

C2

3

10

4

2

1

B

E

DA

C2

3

2

1

Page 5: The shortest path is not always a straight line

WHAT CAN WE USE IT FOR?• Exact computations

• any algorithm that depends on the shortest paths• reachability, connectivity• betweenness centrality, closeness centrality

• Approximation• PageRank, random walks• eigenvector centrality• community detection, clustering

5

Page 6: The shortest path is not always a straight line

WHAT CAN WE USE IT FOR?• Exact computations

• any algorithm that depends on the shortest paths• reachability, connectivity• betweenness centrality, closeness centrality

• Approximation• PageRank, random walks• eigenvector centrality• community detection, clustering

5

Improves community detection modularity and recommender

systems accuracy

Page 7: The shortest path is not always a straight line

IMPACT ON LARGE-SCALE SYSTEMS• Graph Databases

• fewer edges => smaller path search space

• Batch Graph Processing• CPU and memory requirements depend on #messages

• #messages proportional to #edges

• fewer edges => improved analysis performance

• Graph Compression• fewer edges => storage reduction

6

Page 8: The shortest path is not always a straight line

BACKGROUND

Page 9: The shortest path is not always a straight line

SEMI-METRICITYIn a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints

8

B

E

DA

C2

3

10

4

2

1

Page 10: The shortest path is not always a straight line

SEMI-METRICITYIn a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints

9

B

E

DA

C2

3

10

4

2

1

CE is 1st-order semi-metric:

C-D-E is a shorter2-hop path

Page 11: The shortest path is not always a straight line

SEMI-METRICITYIn a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints

10

B

E

DA

C2

3

10

4

2

1

AD is 2nd-order semi-metric:

A-B-C-D is a shorter 3-hop path

CE is 1st-order semi-metric:

C-D-E is a shorter2-hop path

Page 12: The shortest path is not always a straight line

SEMI-METRICITYIn a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints

11

B

E

DA

C2

3

10

4

2

1

CE is 1st-order semi-metric:

C-D-E is a shorter2-hop path

AD is 2nd-order semi-metric:

A-B-C-D is a shorter 3-hop path

AB, BC, CD, DE are metric

Page 13: The shortest path is not always a straight line

BACKBONE ALGORITHM

Page 14: The shortest path is not always a straight line

BACKBONE CALCULATION• Calculating the backbone:

• find all semi-metric edges: 1 BFS per edge?• compute APSP and store O(N2) paths

13

Page 15: The shortest path is not always a straight line

BACKBONE CALCULATION• Calculating the backbone:

• find all semi-metric edges: 1 BFS per edge?• compute APSP and store O(N2) paths

Can we calculate or approximate the backbone

without solving APSP?

13

Page 16: The shortest path is not always a straight line

ORDER OF SEMI-METRICITY

14

Page 17: The shortest path is not always a straight line

ORDER OF SEMI-METRICITY

14

Most semi-metric edges are1st-order semi-metric

Page 18: The shortest path is not always a straight line

A 3-PHASE BACKBONE ALGORITHM

15

Find 1st-order semi-metric edges: only look at triangles

1.

Page 19: The shortest path is not always a straight line

A 3-PHASE BACKBONE ALGORITHM

15

Find 1st-order semi-metric edges: only look at triangles

1. Scalable & practicalfor large graphs

Page 20: The shortest path is not always a straight line

EXAMPLE

16

B

E

DA

C2

3

10

4

2

1

Page 21: The shortest path is not always a straight line

EXAMPLE

17

B

E

DA

C2

3

10

4

2

1

Phase 1

Page 22: The shortest path is not always a straight line

EXAMPLE

18

B

E

DA

C2

3

10 2

1

Phase 1

Page 23: The shortest path is not always a straight line

A 3-PHASE BACKBONE ALGORITHM

19

Find 1st-order semi-metric edges: only look at triangles

1. Scalable & practicalfor large graphs

Page 24: The shortest path is not always a straight line

A 3-PHASE BACKBONE ALGORITHM

19

Find 1st-order semi-metric edges: only look at triangles

1.

Identify metric edges in 2-hop paths

2.

Scalable & practicalfor large graphs

Page 25: The shortest path is not always a straight line

A 3-PHASE BACKBONE ALGORITHM

19

Find 1st-order semi-metric edges: only look at triangles

1.

Identify metric edges in 2-hop paths

2.

Scalable & practicalfor large graphs

Most semi-metric edgeshave been removed

Page 26: The shortest path is not always a straight line

EXAMPLE

20

B

E

DA

C2

3

10 2

1

Phase 2

Page 27: The shortest path is not always a straight line

EXAMPLE

20

B

E

DA

C2

3

10 2

1

Phase 2

M

M

MM

The lowest-weight edge of every vertex is metric

Page 28: The shortest path is not always a straight line

EXAMPLE

20

B

E

DA

C2

3

10 2

1

Phase 2

M

M

MM

The lowest-weight edge of every vertex is metric

uv2

4

2

1

any indirect pathfrom u to vwould have

larger weight

Page 29: The shortest path is not always a straight line

EXAMPLE

20

B

E

DA

C2

3

10 2

1

Phase 2

?

M

M

MM

The lowest-weight edge of every vertex is metric

uv2

4

2

1

any indirect pathfrom u to vwould have

larger weight

Page 30: The shortest path is not always a straight line

A 3-PHASE BACKBONE ALGORITHM

21

Find 1st-order semi-metric edges: only look at triangles!

1.

Identify metric edges in 2-hop paths

2.

Scalable & practicalfor large graphs!

Most semi-metric edgeshave been removed

Page 31: The shortest path is not always a straight line

A 3-PHASE BACKBONE ALGORITHM

21

Find 1st-order semi-metric edges: only look at triangles!

1.

Identify metric edges in 2-hop paths

2.

Run a BFS for remaining unlabeled edges.

3.

Scalable & practicalfor large graphs!

Most semi-metric edgeshave been removed

Page 32: The shortest path is not always a straight line

A 3-PHASE BACKBONE ALGORITHM

21

Find 1st-order semi-metric edges: only look at triangles!

1.

Identify metric edges in 2-hop paths

2.

Run a BFS for remaining unlabeled edges.

3.

Scalable & practicalfor large graphs!

1%-9% edges

Most semi-metric edgeshave been removed

Page 33: The shortest path is not always a straight line

EXAMPLE

22

B

E

DA

C2

3

10 2

1

Phase 3

M

M

MM

BFS

Page 34: The shortest path is not always a straight line

EXAMPLE

22

B

E

DA

C2

3

10 2

1

Phase 3

M

M

MM

BFS

Explore paths with shorter

distances only

Page 35: The shortest path is not always a straight line

EXAMPLE

22

B

E

DA

C2

3

10 2

1

Phase 3

M

M

MM

BFS

Explore paths with shorter

distances only

If the BFS arrives at the target, the edge

is semi-metric

Page 36: The shortest path is not always a straight line

EXAMPLE

23

B

E

DA

C2

3

2

1

Metric Backbone

Page 37: The shortest path is not always a straight line

DISTRIBUTED IMPLEMENTATION

code available: http://grafos.ml/okapi.html#analytics

24

Implementation in the vertex-centric model

Page 38: The shortest path is not always a straight line

EVALUATION

Page 39: The shortest path is not always a straight line

EVALUATION GOALS

• How does our algorithm compare to APSP?

• Are large, real-world graphs semi-metric?

• Can we improve graph analysis performance?

26

Page 40: The shortest path is not always a straight line

COMPARISON TO APSPComputing APSP in Giraph• multiple SSSPs• multiple MSSPs, i.e. SSSPs from

several sources in parallel

27

Page 41: The shortest path is not always a straight line

COMPARISON TO APSPComputing APSP in Giraph• multiple SSSPs• multiple MSSPs, i.e. SSSPs from

several sources in parallel

27

In the order of months for million-edge graphs

Page 42: The shortest path is not always a straight line

COMPARISON TO APSPComputing APSP in Giraph• multiple SSSPs• multiple MSSPs, i.e. SSSPs from

several sources in parallel

27

In the order of months for million-edge graphs

In the order of days for million-edge graphs

Page 43: The shortest path is not always a straight line

COMPARISON TO APSPComputing APSP in Giraph• multiple SSSPs• multiple MSSPs, i.e. SSSPs from

several sources in parallel

27

In the order of months for million-edge graphs

In the order of days for million-edge graphs

Our algorithm is 120-180x faster than SSSPand 11-14x faster than MSSP: order of hours for million-edge graphs

Page 44: The shortest path is not always a straight line

ALGORITHM PHASES

28

Phase 1 Phase 2 Phase 3

Page 45: The shortest path is not always a straight line

ALGORITHM PHASES

28

Phase 1 Phase 2 Phase 3

Very fastand scalable

Page 46: The shortest path is not always a straight line

ALGORITHM PHASES

28

Phase 1 Phase 2 Phase 3

Very fastand scalable

Removes up to 90%of semi-metric edges

Page 47: The shortest path is not always a straight line

ALGORITHM PHASES

28

Phase 1 Phase 2 Phase 3

Very fastand scalable

Removes up to 90%of semi-metric edges

Moderately fast

Page 48: The shortest path is not always a straight line

ALGORITHM PHASES

28

Phase 1 Phase 2 Phase 3

Very fastand scalable

Removes up to 90%of semi-metric edges

Moderately fast

Labels up to 60%of the unlabeled edges

Page 49: The shortest path is not always a straight line

ALGORITHM PHASES

28

Phase 1 Phase 2 Phase 3

Very fastand scalable

Removes up to 90%of semi-metric edges

Moderately fast

Labels up to 60%of the unlabeled edges

Slow

Page 50: The shortest path is not always a straight line

ALGORITHM PHASES

28

Phase 1 Phase 2 Phase 3

Very fastand scalable

Removes up to 90%of semi-metric edges

Moderately fast

Labels up to 60%of the unlabeled edges

Slow

Labels up to 1-9%of the total edges

Page 51: The shortest path is not always a straight line

ALGORITHM PHASES

28

Phase 1 Phase 2 Phase 3

Very fastand scalable

Removes up to 90%of semi-metric edges

Moderately fast

Labels up to 60%of the unlabeled edges

Slow

Labels up to 1-9%of the total edges

Phase 1 is the fastest and most useful phase

Page 52: The shortest path is not always a straight line

PHASE 1 SCALABILITY

29

Page 53: The shortest path is not always a straight line

PHASE 1 SCALABILITY

29

<200s on a billion-edge graph

Page 54: The shortest path is not always a straight line

PHASE 1 SCALABILITY

29

almost linear scalability

<200s on a billion-edge graph

Page 55: The shortest path is not always a straight line

SEMI-METRICITY IN REAL GRAPHS

30

Graph |V| |E| metric semi-metricity

Facebook 190M 49.9B custom 26.5%Twitter 40M 1.5B jaccard 39%Tuenti 12M 685M jaccard 59%

Livejournal 4.8M 34M jaccard 40%NotreDame 0.3M 1.5M jaccard, adamic 45%-29%

DBLP 318K 1M jaccard, adamic 23%-9%Twitter-ego 81K 1.7M jaccard, adamic 57%-39%Movielens 1.6K 1.9M jaccard 88%

Facebook 1K 143K #messages, message size 78%-77%

US-Airports 0.5K 6K #passengers 72%C-Elegans 0.3K 2.3K #connections 17%

Page 56: The shortest path is not always a straight line

SEMI-METRICITY IN REAL GRAPHS

30

Graph |V| |E| metric semi-metricity

Facebook 190M 49.9B custom 26.5%Twitter 40M 1.5B jaccard 39%Tuenti 12M 685M jaccard 59%

Livejournal 4.8M 34M jaccard 40%NotreDame 0.3M 1.5M jaccard, adamic 45%-29%

DBLP 318K 1M jaccard, adamic 23%-9%Twitter-ego 81K 1.7M jaccard, adamic 57%-39%Movielens 1.6K 1.9M jaccard 88%

Facebook 1K 143K #messages, message size 78%-77%

US-Airports 0.5K 6K #passengers 72%C-Elegans 0.3K 2.3K #connections 17%

% 1st-order semi-metric edges =>

reduction in memory and communication

Page 57: The shortest path is not always a straight line

QUERY SPEEDUP ON NEO4J

31

6.7x speedup

Page 58: The shortest path is not always a straight line

APACHE GIRAPH SPEEDUP

32

Including the time to calculate the backbone

4x speedup

Page 59: The shortest path is not always a straight line

APACHE GIRAPH SPEEDUP

33

6x speedup

Page 60: The shortest path is not always a straight line

COMMUNICATION REDUCTION

34

Up to 70% for highly semi-metric graphs

Page 61: The shortest path is not always a straight line

BEST PRACTICESWhen to use the backbone?

• semi-metric weighting schemes, e.g. neighborhood similarity• we can amortize the overhead: e.g. many algorithms on the same graph,

multiple distance queries• lossy compression is ok

When not to use the backbone?

• for metric weighting schemes• we need to run one-off analysis• we need lossless compression

35

Page 62: The shortest path is not always a straight line

RECAP: MAIN CONTRIBUTIONS

36

• An algorithm for computing the metric backbone without solving APSP

• An open-source distributed implementation• Graph query and graph analytics speedup on

Neo4j and Apache Giraph

Page 63: The shortest path is not always a straight line

THE SHORTEST PATH IS NOT ALWAYS A STRAIGHT LINE

leveraging semi-metricity in large-scale graph analysis

Vasiliki Kalavri ([email protected]) KTH Royal Institute of TechnologyTiago Simas ([email protected]) Telefonica Research Dionysios Logothetis ([email protected]) Facebook