Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural...

53
Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler

Transcript of Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural...

Page 1: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

Approximating Structural Properties of Graphs by Random Walks Christian Sohler

Page 2: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

2

Very Large Networks

Examples Social networks The human brain Crystals Chip design

Size 109 – 1023 vertices Petabytes of additional information possible

Page 3: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

3

Very Large Networks

Classical graph problems Connectivity MinCut, MaxCut Graphclustering Graphisomorphism

Difficulties Graph does not fit into main memory

Page 4: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

4

Classification of Very Large Networks – A Vision

Exampe questions Is a country a democracy or a totalitarian

country? Is a patient schizophrenic? Is software malicious?

Formalization Given a set of graphs with class labels

(training set) Find a classifier for new graphs

Page 5: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

5

Classification of Very Large Networks – A Vision

A typical szenario Hundreds or thousands of graphs Each graph is extremly large Graphs are sparse

A possible approach Describe graphs by features

(graph properties) Apply classical learning algorithms

The challenge Computation of ten thousands of features

for graphs with billions of vertices

(12,3,-5,10,0,0,…,20,3)

Page 6: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

6

Classification of Very Large Networks – A Sampling Approach

Random Sampling Compute a graph property approximately

by random sampling

Informal Question What can we learn from the local structure

of a sparse graph about its global properties?

Sampling from Graphs How can we sample a graph?

Page 7: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

7

Classification of Very Large Networks – A Sampling Approach

Examples of different sampling strategies1. Sample set S of s vertices and look at all edges within S

(the subgraph G[S] induced by S)

2. Sample set S of s edges and look at their graph

3. Sample a set S of s vertices and perform a BFS from each of them

4. Sample a set S of s vertices and perform a random walk from each of them Many more possibilities…

Question Which is the right sampling strategy for my learning problem?

Page 8: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

8

Classification of Very Large Networks – A Sampling Approach

Examples of different sampling strategies1. Sample set S of s vertices and look at all edges within S

(the subgraph G[S] induced by S)

2. Sample set S of s edges and look at their graph

3. Sample a set S of s vertices and perform a BFS from each of them

4. Sample a set S of s vertices and perform a random walk from each of them Many more possibilities…

Question Which is the right sampling strategy for my learning problem? Depends on the problem…

Page 9: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

9

Classification of Very Large Networks – A Sampling Approach

Question 1 Assume you have some classification task that involves city maps. Which

of our four sampling methods is your method of choice?

Possible Answers1. Sample set S of s vertices and look at all edges within S

2. Sample set S of s edges and look at their graph

3. Sample a set S of s vertices and perform a BFS from each of them

4. Sample a set S of s vertices and perform a random walk from each of them

Page 10: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

10

Classification of Very Large Networks – A Sampling Approach

Question 2 Assume you have some classification task that involves social networks.

Which of our four sampling methods is your method of choice?

Possible Answers1. Sample set S of s vertices and look at all edges within S

2. Sample set S of s edges and look at their graph

3. Sample a set S of s vertices and perform a BFS from each of them

4. Sample a set S of s vertices and perform a random walk from each of them

Page 11: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

11

First Wrap-Up

Motivation Some classification problems involve sets of huge graphs No efficient algorithm for some fundamental graph problems known

Sampling approach We would like to pick small samples from the graph(s) and use them for

graph classification

Challenge There are many different sampling procedures We need to understand which is the right one for which problem

Page 12: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

12

Sampling from Very Large Networks

Property Testing [Rubinfeld, Sudan, 1996, Goldreich, Goldwasser, Ron, 1998] Formal framework to study sampling algorithms for very large networks

Relaxation of „Standard Decision Problems“ Want to distinguish whether input graph G has a property or is far away from it If G neither has the property nor is far away from it the algorithm may give an

arbitrary answer Randomized algorithms with bounded (worst case) error probability Only looks at small part of the graph

Different graph models Dense graphs, bounded degree graphs, directed graphs

Page 13: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

13

Property Testing in Bounded Degree Graphs

Bounded degree graphs [Goldreich, Ron, 2002]

Undirected Graph G=(V,E) Maximum degree bounded by D D constant

Oracle access V={1,…,n} n is known to the algorithm Query(i,j) returns j-th neighbor of vertex i or a

symbol that indicates that this neighbor does not exist

2 4 /

1 3 5

2 / /

1 5 /

2 4 /

1 2 3

4 5

Page 14: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

14

Property Testing in Bounded Degree Graphs

Graph properties A graph property is a set of graphs that is

closed under isomorphism

Definition [Goldreich, Ron, 2002] G=(V,E) is e-far from P, if one has to modify

more than eDn edges to obtain a bounded degree graph with property P.

connected

e-far

Page 15: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

15

Property Testing in Bounded Degree Graphs

Property Tester for property P [Goldreich, Ron, 2002] Oracle access to input graph G Accepts with probability at least 2/3, if G has property P Rejects with probability at least 2/3, if G is e-far from P

Quality measures Query complexity: Maximum number of oracle queries Running time

Page 16: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

16

A First Example: Connectivity

Connectivitytester(G,e,D) [Goldreich, Ron, 2002](1) Sample set S with s=8/(eD) vertices uniformly at random from V

(2) For every vertex from S:

(3) Perform a BFS until

(a) 4/(eD) vertices have been discovered or

(b) all vertices of a small connected component have been discovered

(4) if (b) then reject

(5) accept

Page 17: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

17

A First Example: Connectivity

Connectivitytester(G,e,D) [Goldreich, Ron, 2002](1) Sample set S with s=8/(eD) vertices uniformly at random from V

(2) For every vertex from S:

(3) Perform a BFS until

(a) 4/(eD) vertices have been discovered or

(b) all vertices of a small connected component have been discovered

(4) if (b) then reject

(5) accept

Observation• ConnectivityTester accepts every connected graph

Page 18: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

18

A First Example: Connectivity

Connectivitytester(G,e,D) [Goldreich, Ron, 2002](1) Sample set S with s=8/(eD) vertices uniformly at random from V

(2) For every vertex from S:

(3) Perform a BFS until

(a) 4/(eD) vertices have been discovered or

(b) all vertices of a small connected component have been discovered

(4) if (b) then reject

(5) accept

Claim• If G is e-far from connected, then G has more than eDn/2 connected

components.

Page 19: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

19

A First Example: Connectivity

Connectivitytester(G,e,D) [Goldreich, Ron, 2002](1) Sample set S with s=8/(eD) vertices uniformly at random from V

(2) For every vertex from S:

(3) Perform a BFS until

(a) 4/(eD) vertices have been discovered or

(b) all vertices of a small connected component have been discovered

(4) if (b) then reject

(5) accept

Claim• At least eDn/4 of the connected components have size at most 4/(eD).

Page 20: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

20

A First Example: Connectivity

Connectivitytester(G,e,D) [Goldreich, Ron, 2002](1) Sample set S with s=8/(eD) vertices uniformly at random from V

(2) For every vertex from S:

(3) Perform a BFS until

(a) 4/(eD) vertices have been discovered or

(b) all vertices of a small connected component have been discovered

(4) if (b) then reject

(5) accept

Theorem• Connectivitytester is a property tester with query complexity O(1/(e²D)).

Page 21: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

21

Second Wrap-Up – Introduction to Property Testing

Property Testing Approximately decide based on random sampling whether a graph has a

property or is far away from it Quality measure: Query complexity

Connectivity Sampling + BFS Check whether the sample violates the property

Page 22: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

22

Second Wrap-Up – Introduction to Property Testing

Question 3 Is the following algorithm a property tester for planarity (for right choice of f)?

Planaritytester(G,e,D) (1) Sample set S with s= f(e,D) vertices uniformly at random from V

(2) For every vertex from S:

(3) Perform a BFS until

(a) f(e,D) vertices have been discovered or

(b) the discovered graph is not planar

(4) if (b) then reject

(5) accept

Page 23: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

23

Second Wrap-Up – Introduction to Property Testing

Bad news• There is a class of graphs such that every cycle

has Length W(log n) and that are e-far from planar

Good news• The sampling is fine, we just need to modify

our acceptance condition

23

Page 24: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

24

Random Walks, Stationary Distributions & Convergence

Random Walk In each step:

move from current vertex v to a neighbor chosen uniformly at random

Convergence If G is connected and not bipartite, a random walk converges to a unique

stationary distribution Pr[Random Walk is at vertex v] deg(v)

Page 25: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

25

Random Walks, Stationary Distributions & Convergence

Random Walks on Maps A random walk on a planar graph has the

tendency to stay local It takes a long time to reach the stationary

distribution Reason: The network has sparse cuts

Random Walks on Social Networks A random walk will quickly move to a „random

place“ Fast convergence The network does not have sparse cuts

Page 26: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

26

Random Walks, Stationary Distributions & Convergence

Lazy Random Walk In each step:

- Probability to move from current vertex v to neighbor u is 1/(2D) - stays at v with remaining probability

Convergence of Lazy Random Walks Stationary distribution is uniform

Rate of Convergence Can be expressed in terms of the conductance of G or the second largest

eigenvalue of the transition matrix O(log n) steps, if G is an expander graph

Page 27: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

27

Conductance, Expanders & Small Worlds

Definition The expansion F(U) of a set U is defined as

The conductance FG of G is minU:1≤|U|≤|V|/2 F(U)

Definition A graph G=(V,E) is called f-expander, if FG≥f for some constant .f

Interpretations Expander graphs satisfy the „small-world phenomenon“ Conductance can be viewed as a measure for the social connectivity of a

network

||

|} and :),{(|

UD

UVvUuEvu

Page 28: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

28

Testing Expanders

Facts A lazy random walk converges to uniform distribution A lazy random walk converges quickly in expander graphs

Hope A lazy random walk converges much slower, if the graph is e-far from an

expander graph In particular, we hope that the distribution of the endpoints of a Q(log n)-

step lazy random walk differs significantly from the uniform distribution

Question If so, how could we exploit this to design a property testing algorithm?

Page 29: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

29

The Birthday Problem & Testing Uniform Distributions

Birthday Problem n possible birthdays k persons with birthday chosen uniformly at random How large must k be so that with constant probability two person have the

same birthday?

Analysis p=(1/n,..,1/n)T

||p||² is the collision probability of two birthdays If we have k persons then the expected number of collision is So, for k = Q(n) we expect to see a collision

²||||2

pk

Page 30: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

30

Testing Uniform Distributions

Observation The uniform distribution minimizes the expected number of pairwise

collisions If a distribution q differs significantly from the uniform distribution then

||q||²>>||p||²

TestUniformDistribution(distribution q)1. Sample Q(n) elements according to q

2. if the number of pairwise collisions is too large then reject

3. else accept

Page 31: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

31

Testing Expanders

TestingExpanders(G)1. Sample set S of s vertices uniformly at random

2. for each vS do

3. Let q be the distribution of endpoints of a Q(log n)-step lazy random walk

4. if TestUniformDistribution(q) rejects then reject

5. accept

History• Algorithm was invented by [Goldreich and Ron, 2000] and algorithm

conjectured to be a property tester• First complete analysis by [Czumaj and Sohler, 2010]

(but weaker than conjectured)• Later improved by [Nachmias and Shapira, 2010] and [Kale and Seshadhri,

2011]

Page 32: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

32

Final Result

Theorem [Nachmias and Shapira, 2010, Kale and Seshadhri, 2011] Algorithm TestingExpansion accepts every f-expander and rejects every

graph that is e-far from a (Q f²)-expander. The algorithm has a running time of O(n1/2+d).

Key structural property of „e-far“-graphs If G is e-far from a (Q f²)-expander then there exists a set U of W(en)

vertices with F(U) = O(f²). Implies that for many vertices, the distribution of endpoints of a random

walk of length O(log n) is significantly different from the uniform distribution

Page 33: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

33

Third Wrap-Up – Testing Expansion

(Lazy) Random Walks Moves from a vertex to a random neighbor Converges to uniform distribution Speed of convergence depends on graph structure

Testing Expansion Random Walk converges quickly in expander graphs Random Walk converges slower if we are far from expander graphs Number of collisions among end points of random walks is minimized in

expander graphs We can test expansion by counting collisions

Page 34: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

34

Graph Clustering & Web Communities

Web Graph Communities Set of vertices that induces an expander graph and has a sparse cut to the

rest of the graph Question: Is the web graph composed of a set of at most k communities?

Definition A subset CV is called (Fin, Fout )-cluster, if

FG(G[C]) ≥ Fin

F(C) ≤ Fout

Definition A partition of V into at most k (Fin, Fout )-clusters is called (k, Fin, Fout )-clustering

Page 35: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

35

Testing k-Clusterings

A Simple Case? Distinguish between a union of at most k expander graphs with no edges in

between and a set of more than k (large) expander graphs with no edges in between

Can we use our previous algorithm to test for a k-clustering?

Expander

Expander

Expander

Expander

Page 36: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

36

Testing k-Clusterings

A Simple Case? No! We do not know the size of the clusters (expander graphs) and estimating

the support size of a distribution is hard [Raskhodnikova et al., 2009]

Expander

Expander

Expander

Expander

Page 37: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

37

Testing k-Clusterings

New idea If two vertices come from the same cluster, the random walks quickly

converge to the same distribution So, we could try to sample a set of vertices and check for sets of vertices

whose random walks induce the same distributions

Expander

Expander

Expander

Expander

Page 38: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

38

Main Idea [Batu et al. 2013; Chan et al. 2014] if pq then then the following experiments should give roughly the same

number of collisions between elements from S and T:

Draw two sets S and T of m elements from p Draw two sets S and T of m elements from q Draw set S of m elements from p and set T of m elements from q

If p and q differ significantly, at least one of the three values is different

Testing Closeness of Distributions

Page 39: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

39

Theorem [Batu et al. 2013; Chan et al. 2014] There is a tester that w.p. 2/3 accepts, if ||p-q||≤e/2 and rejects, if ||p-q||≥e.

The query complexity of the algorithms is O(b/e²), where b is an upper bound on ||p||² and ||q||².

Testing Closeness of Distributions

Page 40: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

40

Theorem [Batu et al. 2013; Chan et al. 2014] There is a tester that w.p. 2/3 accepts, if ||p-q||≤e/2 and rejects, if ||p-q||≥e.

The query complexity of the algorithms is O(b/e²), where b is an upper bound on ||p||² and ||q||².

We will need b to be O(1/n)

Testing Closeness of Distributions

Page 41: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

41

The Algorithm

ClusteringTest1. Sample set S of s vertices uniformly at random

2. For any vS let D(v) be the distribution of end points of a random walk of length Q(log n) starting at v

3. for each pair u,vS do

4. if D(u) and D(v) are close then add an edge (u,v) to the „cluster graph“ on vertex set S

5. accept, if and only if the cluster graph is a collection of at most k cliques

Page 42: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

42

Testing k-Clusterings

Observation Algorithm ClusteringTest distinguishes between at most k expanders and

more than k (large) expanders

Expander

Expander

Expander

Expander

Page 43: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

43

Testing k-Clusterings

Observation Algorithm ClusteringTest distinguishes between at most k expanders and

more than k (large) expanders Can we generalize it to testing of (k, Fin, Fout )-clusterings ?

Expander

Expander

Expander

Expander

Page 44: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

44

Testing k-Clusterings - Soundness

Challenge Since the clusters may be connected in a (k, Fin, Fout )-clustering the

stationary distribution may be uniform over G (and not over the cluster)

Page 45: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

45

Testing k-Clusterings - Soundness

Challenge Since the clusters may be connected in a (k, Fin, Fout )-clustering the

stationary distribution may be uniform over G (and not over the cluster) Need to show that for proper length of the random walk there is an

„intermediate“ distribution that it is „reasonably stable“ w.r.t. l2-error

Page 46: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

46

The Algorithm

ClusteringTest1. Sample set S of s vertices uniformly at random

2. For any vS let D(v) be the distribution of end points of a random walk of length Q(log n) starting at v

3. for each pair u,vS do

4. if D(u) and D(v) are close then add an edge (u,v) to the „cluster graph“ on vertex set S

5. accept, if and only if the cluster graph is a collection of at most k cliques

Page 47: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

47

The Algorithm

ClusteringTest1. Sample set S of s vertices uniformly at random

2. For any vS let D(v) be the distribution of end points of a random walk of length Q(log n) starting at v

3. if ||D(v)||² > O(1/n) then reject

4. for each pair u,vS do

5. if D(u) and D(v) are close then add an edge (u,v) to the „cluster graph“ on vertex set S

6. accept, if and only if the cluster graph is a collection of at most k connected components

Page 48: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

48

Testing k-Clusterings - Completeness

Required Properties of a (k, Fin, Fout)-clustering For most vertices v: The distribution D(v) of end points of a lazy random

walk of proper length has ||D(v)||² = O(1/n) For most pairs u,v from the same cluster: ||D(v)- D(u)||² is very small

Useful Tool – Higher Order Cheeger‘s Inequality [Lee et al. 2014] Relates (k, Fin, Fout )-clustering to the k+1 largest eigenvalues

Page 49: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

49

Testing k-Clusterings - Soundness

Structural property of „e-far“-graphs (similarly to expanders) If G is e-far from a (k, Fin*, Fout* )-clusterings then there exists a partition into

k+1 sets C1,…,Ck+1 each of W(e²n/k) vertices and with F(Ci) = O(Fin*/e²).

Page 50: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

50

Testing k-Clusterings

Theorem [Czumaj, Peng, Sohler, 2015] Algorithm ClusteringTester accepts every (k, Fin, Fout)-clustering with

probability at least 2/3 and rejects every graph that is e-far from every (k, Fin*, Fout *)-clustering with probability at least 2/3, where Fout =O(e4 Fin²) and Fin* = Q(e4 Fin²/log n) for constants k,D.

The running time of the algorithm is O*(n).

Page 51: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

51

Fourth Wrap-Up

Testing Clusterings End points of Random Walk of proper length should be uniform on its

cluster with not much probability „outside“ If Random Walks start from two different points of the same cluster, their

end point distributions are similar Collision statistics can be used to pairwise test similarity of distributions This can be used to approximate the cut structure

Take away message The distribution of end points of random walks (possibly comparing

different starting vertices) contains a lot of information about the cut structure of a graph

Page 52: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

52

Summary

Vision Learning from very large sets of massive graphs

Approach Feature computation by random sampling Analysis in the framework of property testing

Two Examples Expanders (connectivity measure in social networks) Clustering (structure of social networks)

Page 53: Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science

53

Thank you!

Source

Slide 2: Allan Ajifo und cobalt123; creative common license

Slide 3: GustavoG und Jasper Nance; creative common license

Slide 4: Wikipedia; Jason Brown; creative common license

Slide 5: GustavoG; creative common license

Slide 6: GoldenRibbon, creative common license