Graph Partitioning and Clustering in Theory and Practice - Christian ...

1 Christian Schulz:Graph Partitioning and Clustering

Department of InformaticsInstitute for Theoretical Computer Science

Institute for Theoretical Computer Science

Graph Partitioning and Clusteringin Theory and Practice

Christian Schulz

www.kit.edu

OrganisationLecturer



Christian SchulzKarlsruhe Institute of TechnologyMail: [email protected] 206Consultation: Sa. 00:00-01:00 Uhr

[email protected]

Organisation



Lecture⇒WebsiteSeveral seminar rooms (website), Am Fasanengarten 5

Exam5 ECTS Credits80% oral exam, 20% mini seminarVF: 1 (Theoretische Grundlagen), 2 (Algorithmentechnik)

Webseite: http://algo2.iti.kit.edu/gpgc_lecture.phpUser: gpvlPassword: 2015

Mailinglist! send me an email if you want to participate in the lecture!

http://algo2.iti.kit.edu/gpgc_lecture.php

Helpful Knowlege



Informatics I/II↔ Algorithms IAlgorithm Engineering↔ Algorithms IIBasis Linear AlgebraVL: relvant things will be repeated

Area of specialization: Algorithm Engineering

Material



slides, script, black board (downloadable)scientific publications (see website)Basics:

1. Introduction to Algorithms [CLRS]2. Algorithms and Data Structures [Mehlhorn, Sanders]

Graph Partitioning:1. High Quality Graph Partitioning [Schulz]2. Book: Graph Partitioning [Bichot, Siarry]

Graph Clustering:1. An Algorithmic Walk from Static to Dynamic Graph Clustering [Görke]2. Lots of the clustering slides due to Robert Görke and Dorothea Wagner

Exercise Task



Mini seminar20 % of ECTS15 min talk about scientific papertopics on website

depending on number of participants 1-2 lectures.

Excercise Task



A novel partitioning method for block-structured adaptive meshesSONIC: Streaming Overlapping Community DetectionAcyclic Partitioning of Large-Scale Directed Acyclic GraphsMultilevel Algorithms for Multi-Constraint Graph PartitioningMultithreaded Graph PartitioningScalable Multilevel Support Vector Machines3D Cell Nuclei Segmentation with Balanced Graph PartitioningA Graph Partitioning Model of Congressional RedistrictingA k-way Greedy Graph Partitioning with Initial Fixed Vertices forParallel ApplicationsA Parallel Hill-Climbing Refinement Algorithm for Graph PartitioningPartitioning Trillion-edge Graphs in MinutesGraph Partitioning using Quantum Annealing on the D-Wave System

Open Research Projects



we have lots of them!

Lecture Overview



Today:Fundamentals, Problem Definitions and Objective FunctionsLots of Applications

Later:NP-Hardness of GP and GCExact Partitioning/ClusteringSpectral Partitioning/ClusteringLocal Search, Multilevel AlgorithmsParallel, External- und Semi-ExternalGreedy Agglomeration / Top-Down ApproachesMin-Cut Tree ClusteringEvolutionary Algorithms and Meta-HeuristicsDynamic Clustering, Online Algorithms

Algorithm Engineering



Algorithms

implement

design

experiment

anal

yze

modelling reality hardfinding optima hardsatisfying needs of application hard

still need clustering/partitioning⇒ need good foundation

Fundamentals



Blackboard

ε-Balanced Graph PartitioningDefinition



Partition graph G = (V,E, c : V → R>0, ω : E → R>0)into k disjoint blocks V1, . . . , Vk s.t.

total node weight of each block ≤ 1 + ε

ktotal node weight

objective as small as possible

Parallel Applications



Model of Computation/Communication1. Each volume (data, calculation) is

represented by a vertex2. Interdependencies are represented by

edges

All PE’s get same amount of workCommunication is expensive

Graph Partitioning Problem:Partition a graph into (almost) equally sized blocks, such that thenumber of edges connecting vertices from different blocks is minimal.

Graph PartitioningCommon Objectives



total cut size (by far most commonly used)∑i<j

ω(Ei,j)

very simple definitionexact solution transfer multilevel algorithms (later)on easy simulation graphs correlated with more realistic metricson highly unstructured graphs not as correlated [Buluc, Madduri’12]

Graph PartitioningCommon Objectives



maximum (total) communication volume

MCV (P) := maxp

∑v∈Vp

|Vi | ∃u, v ∈ E, u ∈ Vi 6= Vp|

motivated by parallel applications (i.e. MatVecs)better model of ”reality”

Processor I Processor II

Communication

Graph PartitioningMore Objectives



additionally max quotient graph degree?each message sent gives us start up overheadwait for incoming messages of adjacent PEs

conductance or expansion (sometimes, definition later)application depend metrics (see for example CRP)combination of metrics of in a Pareto way?

Finite Element MethodSolve real world problems!



→

solve large, sparse linear equation systems Ax = b

sparse := number of non-zeros O(n)

distribute workload, minimize communication

An Example ProblemExcursion



Compute Heat Distribution in a RegionLet Ω be a region.

−∆u = f in Ω

u = 0 on ∂Ω

Weak FormulationFind u ∈ V such that

a(u, v) = (f, v)L2(Ω) ∀v ∈ V

Where a(u, v) :=∫

Ω∇u∇vdx a contiguous bilinear map

Convert to Linear Equation SystemExcursion



Choosing Vh := spanφ1, · · · , φn ⊂ V yields:

ApproximationFind uh ∈ Vh such that

a(uh, φ) = (f, φ)L2(Ω) ∀φ ∈ Vh

uh ∈ Vh ⇔ uh =∑n

i=1 αiφi (αi ∈ R)

Find αi ∈ R such that

a(

n∑i=1

αiφi, φ) = (f, φ)L2(Ω) ∀φ ∈ Vh

Convert to Linear Equation SystemExcursion



a(

n∑i=1

αiφi, φ) = (f, φ)L2(Ω) ∀φ ∈ Vh

⇔n∑i=1

αia(φi, φj) = (f, φj)L2(Ω) ∀j ∈ 1, · · · , n

⇔Ax = b

A =

a(φ1, φ1) . . . a(φ1, φn)...

...a(φn, φ1) . . . a(φn, φn)

, b =

(f, φ1)...

(f, φn)

Solving Linear Equation SystemsExcursion



Typically via iterative methods, e.g. Jacobi methodRequires sparse matrix vector multiplication

xk+1 = D−1(b−Rxk)

where D = diag(A), R = A−DMain goal: distribute columns of A and x over equally processorssuch that communication is minimized




xk+1i = 1

aii

(bi −

∑i6=j aijx

kj

)Define model of computation and communication:

graph G = (V = 1, · · · , n, E)

one node per column of the matrixe = i, j ∈ E ⇔ ai,j 6= 0

associate xki , xk+1i , Ai,∗ with node i ∈ V

0 3 0 23 0 1 00 1 0 02 0 0 0

1 2

3

4




xk+1i = 1

aii

(bi −

∑i6=j aijx

kj

)partition G: PE l obtains columns associated with Vlstores xki , x

k+1i , Ai,∗ and computes xk+1

i for i ∈ Vlpartitioning⇒ equal workload, less communication

0 3 0 23 0 1 00 1 0 02 0 0 0

1

4

2

3




Other important examples:PageRank!Graph-based map reduce systems (e.g. Pregel, GPS, ...)...

Discussion:workload really balanced in previous model?better models? better objectives?hypergraph partitioning (later)?

Customizable Route Planning[DGPW’11]



Goals:fast shortest path queriesfast exchange of underlying distance metric

Three-stages :metric-independent preprocessing→ graph partitioningmetric customization→ update data-structuresquery stage→ find shortest s− t paths

Customizable Route PlanningBasic Algorithm – Partition Road Network



block size-constraint |Vi| ≤ Uconnected blocks!

Customizable Route PlanningBuild Metric Dependent Data-Structures



build overlay network Hcontains all boundary nodescontains all cut edgesfor each block a clique

for each pair of bnd. nodes in Vi:→ create edge (u, v)cost := shortest path distance in Vi

Customizable Route PlanningQueries



s

t

source s and target tbuild union of H, Cs, Ct

Cs/Ct subgraph of block of s/t

search for shortest-path

Customizable Route Planning



queries 3000 times faster than plain Dijkstraused by Bing Mapsreal life: lots of engineering (→ route planning lecture)

More Route Planning using Graph Partitioning:Arc-Flags [Möhring et al. ]Alternative Route Planning [Schieferdecker et al. ]Customizable Contraction Hierarchies [Dibbelt et al. ].....

Further Applications



Rn×n 3 Ax = b ∈ Rn

Graph ClusteringDefinition



Partition graph G = (V,E, ω : E → R>0)into disjoint blocks V1, . . . , Vk′ (k’ not given in advance) s.t.

objective as small as possiblesometimes size-constraint

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Graph ClusteringDefinition



Partition graph G = (V,E, ω : E → R>0)into disjoint blocks V1, . . . , Vk′ (k’ not given in advance) s.t.

k′ = 1 clustering := trivialk′ = n clustering := singletons

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Clustering: Intuition to Formalization



Task:partition graph into natural groupsParadigm:intra-cluster density vs. inter-cluster sparsity

Mathematical Formalization:quality measures for clusterings

Many exist, optimization generally (NP-)hardThere is no single, universally best strategy

Postulations to a Measure



Given a graph G and a clustering C:a quality measure should behave as follows:

more intra-edges⇒ higher qualityless inter-edges⇒ higher qualityclusters must be connectedrandom clusterings should have bad qualitydisjoint cliques should approach maximum qualitylocality of the measuredouble the instance, what should happen . . . same resultcomparable results across instancesfulfill the desiderata of the application. . .

A Theorem of Impossibility[Kleinberg’02]



A warning theorem on the field of data clustering:

TheoremGiven set S. Let f : d 7→ Γ be a function on a distance function d on setS, returning a partition Γ. No function f can simultaneously fulfill thefollowing:

Scale-Invariancefor any distance function d and any α > 0, we have f(d) = f(α · d)

Richnessfor any given partition of a set and any given value Γ, we should be able todefine a distance function d such that f(d) = Γ

Consistencyif we build d′ from d by reducing intra-distances and increasinginter-distances, we should have f(d′) = f(d)

Mathematical FormalizationBottleneck



Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Quality of the clustering, upper cluster:inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap)intra-cluster density: best addit. cut:intra-cluster density: 3 edges for cutting off 4 nodes (expensive)

Examples: Conductance, Expansion



conductance of a cut (C, V \ C):

ϕ(C, V \ C) :=ω(E(C, V \ C))

min∑

v∈Cdeg(v),

∑v∈V \C

deg(v)

(i.e.: thickness of bottleneck which cuts off C)

inter-cluster conductance (C) := 1−maxC∈C ϕ(C, V \ C)(i.e.: 1− worst bottleneck induced by some C ∈ C)

intra-cluster conductance (C) := minC∈C minP]Q=C ϕ|C(P,Q)(i.e.: best bottleneck still left uncut inside some C ∈ C)

expansion of a cut (C, V \ C):

ψ(C, V \ C) :=ω(E(C, V \ C))

min|C|, |V \ C|

Mathematical FormalizationCounting Edges



Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

Measuring clustering quality by counting edgesinter-cluster sparsity: 6 edges of ca. 800 node pairs (few)intra-cluster density: 53 edges of 99 node pairs (many)example: quality measure coverage = # intra-cluster edges

# edges ≈ 0.9

Example Counting Measures



coverage: cov(C) := # intra-cluster edges# edges

(i.e.: fraction of covered edges)

performance: perf(C) := # intra-cluster edges+# absent inter-cluster edges12n(n−1)

(i.e.: fraction of correctly classified pairs of nodes)

density: den(C) := 12

# intra-cluster edges# possible intra-cluster edges + 1

2# absent inter-cluster edges

# possible inter-cluster edges(i.e.: fractions of correct intra- and inter-edges)

modularity: mod(C) := cov(C)− E[cov(C)](i.e.: how clear is the clustering, compared to random network?)

Motivation for Modularity



Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

cov(C) = # intra-cluster edges# edges ≈ 0.9

only one cluster⇒ cov(C′) = 1.0

A Promising RemedyModularity [Girvan and Newman’04]



subtract from coverage the expected value→ useful measure

mod(C) := cov(C) − E[cov(C)]

=m(C)m

− 1

4m2

∑C∈C

(∑v∈C

deg(v)

)2

probability model:1. random graphs, keep the same clustering2. goal: keep expected node degrees, randomly throw in edges3. edge set multi-set

Probability Space of Modularity



Intuition: Keep expected node degrees, randomly throw in edges

1. start with set V2. keep expected degrees3. edge attaches to node v with p = deg(v)

2m

4. other end attaches to w with p = deg(w)2m

5. P[e = (v, w)] = deg(v)·deg(w)4m2

6. P[e = v, w] = deg(v)·deg(w)2m2

7. E[cov(C)] = 14m2

∑C∈C

(∑v∈C deg(v)

)2

3 3

22

4

3 4

43

3

4

2

5

mod(C) = m(C)m − 1

4m2

∑C∈C

(∑v∈C deg(v)

)2






2m


5. P[e = (v, w)] = deg(v)·deg(w)4m2

6. P[e = v, w] = deg(v)·deg(w)2m2

7. E[cov(C)] = 14m2

∑C∈C

(∑v∈C deg(v)

)2v

e

3

mod(C) = m(C)m − 1

4m2

∑C∈C

(∑v∈C deg(v)

)2






2m


5. P[e = (v, w)] = deg(v)·deg(w)4m2

6. P[e = v, w] = deg(v)·deg(w)2m2

7. E[cov(C)] = 14m2

∑C∈C

(∑v∈C deg(v)

)2v

e

w2

3

mod(C) = m(C)m − 1

4m2

∑C∈C

(∑v∈C deg(v)

)2






2m


5. P[e = (v, w)] = deg(v)·deg(w)4m2

6. P[e = v, w] = deg(v)·deg(w)2m2

7. E[cov(C)] = 14m2

∑C∈C

(∑v∈C deg(v)

)2p(e) = 3·2

2·m2

u

e

v3 4

43

43

42 2

2

5

3 3

u

v

mod(C) = m(C)m − 1

4m2

∑C∈C

(∑v∈C deg(v)

)2






2m


5. P[e = (v, w)] = deg(v)·deg(w)4m2

6. P[e = v, w] = deg(v)·deg(w)2m2

7. E[cov(C)] = 14m2

∑C∈C

(∑v∈C deg(v)

)2p(e′) = 32

4·m2

e′

v3 4

43

43

42 2

2

5

3 3

v

mod(C) = m(C)m − 1

4m2

∑C∈C

(∑v∈C deg(v)

)2

ModularityExample



u v

w C1

C2

G, C(G)cov = 1

2

C1

C2

116u v

w 14

14

14

18

116

E[cov] = 58

mod(C) = − 18 (bad!)

Modularity in Practice



easy to use & implementreasonable behavior on many practical instances heavily used in various fields

ecosystem explorationcollaboration analysesbiochemistrystructure of the internet (AS-graph, www, routers)

close to human intuition of quality [Görke’10]non-locality of optimal clustering [folklore]

resolution limit [Fortunato and Barthelemy’07]application specific null models?

Modularity, Algorithmic Theory



The complexity of modularity optimization:

finding C with maximum modularity is NP-hard reduction from 3-PARTITION

restriction to |C| = 2 also hard⇒ not FPT wrt. |C|greedy maximization (later) does not approximatevery limited families combinatorially solvableILP-formulation, feasible for ≈ |V | ≤ 200 (later)diverse results on approximability on specific classes of graphs[DasGupta, Devine 2011]

Surprise[Arnau et al. ]



given a clustering C of G: V1, . . . , V`

Surprise := prob. that random graph R has more intracluster edgesS(C) = P(mR(C) ≥ mG(C))

random model: all graphs labeled with V and exactly m edgesthe smaller S(C), the more surprising are intracluster edges

Deriving a Practical Formular:total number of possible pairs p = n(n− 1)/2maximum number of pairs within clusters M =

∑ |Vi|(|Vi| − 1)/2

urn model (without replacement):M white balls (possible edges within cluster)p−M black balls (possible edges between cluster)

→ hypergeometric distribution

Surprise[Arnau et al. ]



given a clustering C of G: V1, . . . , V`

Surprise := prob. that random graph R has more intracluster edgesS(C) = P(mR(C) ≥ mG(C))

random model: all graphs labeled with V and exactly m edgesthe smaller S(C), the more surprising are intracluster edges

Deriving a Practical Formular:total number of possible pairs p = n(n− 1)/2maximum number of pairs within clusters M =

∑ |Vi|(|Vi| − 1)/2

S(C) =

m∑i=mG(C)

(Mi

)(p−Mm−i

)(pm

)

Clustering Applications



A simplified ecosystem: the antarctic food web(source: Antarctica)

cluster ≈ self-sustaining / indivisible subsystem




Company-internal email traffic, groups are departments(source: )




Excerpt of the network of Amazon recommendations, around”VW Beetle Repairs”

(source: [Gaertler’07])

cluster ≈ customer profile




molecular structure of a protein(Ca2+ /Calmodulin-dependent kinase II (CaMKII)

source: protein database www.rcsb.org)

cluster ≈ functional unit (domain) of a protein




protein interactions(source: Max-Delbrück-Centre for molecular medicine, www.mdc-berlin.de)

cluster ≈ isolatable seat of disease

Scaling of Real-World Instances



„Zachary’s Karate Club“)(vertices/edges = 34/78)




„US college football“ teams and matches(vertices/edges = 115/616)




physical Internet: autonomous systems(vertices/edges ≈ 20K/60K)




instance vertices edgesCoauthors in DBLP 300K 1M

Web .UK-domain ’07 105M 3.3G

OSM road network 910M 2.1G

Facebook 800M 68G

Neurons in human brain & 1011 ∼ 1017 )

Clustering vs. Partitioning



clustering partitioningpurpose analysis handling instances. . . and then? zoom/abstraction work on blocks

# of blocks open predefinedsize of blocks open upper boundcriteria various variousconstraints often none single, multiple

What’s NextA little bit of Theory



NP-Hardness of GP and GCNo finite factor approximation for perfectly balanced GPInteger linear programs for bothSpectral graph partitioning and clustering

NP-HardnessExkurs / Basic Toolbox



Decision problem:Algorithm that decides if input is in a given set M . Example:Hamilton cycle problem:M := G = (V,E) | ∃C ⊆ E : |C| = |V |, C cycle

P: Set of all (decision) problems that can be solved in O(nd)

d: a constantn: number of bits to represent the input

NP: Set of all decision problems for which the instances where theanswer is "yes" have efficiently verifiable proofs that the answer is yesExample: Hamilton cycle problem

NP-HardnessExkurs / Basic Toolbox



Let A ⊆ Σ∗ and B ⊆ Γ∗ be languages.

A≤pB (A is polynomial reducible to B):⇔ ∃f : Σ∗ → Γ∗ : ∀w ∈ Σ∗ : w ∈ A⇔ f(w) ∈ Bwith f computable in polynomial time.

A ≤p B, B ∈ NP ⇒ A ∈ NPA ≤p B, B ∈ P ⇒ A ∈ P

A is NP-hard :⇔ ∀L ∈ NP : L ≤p AA is NP-complete :⇔ A NP-hard and A ∈ NPproofs to show B is NP-hard:

take NP-hard problem A, show A ≤p B⇒ ∀L ∈ NP : L ≤p A ≤p B

NP-Hard Problems



p

SAT

ILP

PARTITION

BIN PACKING

SUBSET SUM

COLORING

KNAPSACK

CLIQUE

VERTEX COVER

STEINER TREE

SET COVER

3SAT

SIMPLE MAX CUT

BALANCED PARTITION

Max-Cut ≤p Balanced Partition



Given a graph G = (V,E) and a number W > 0,is there a cut (S, V \S) such that

|u, v ∈ E | u ∈ S, v ∈ V \S| ≥WNP-complete[Garey, Johnson, Stockmeyer’74]




Given a graph G = (V,E) and a number W > 0,is there a cut (S, V \S) such that

|u, v ∈ E | u ∈ S, v ∈ V \S| ≥W

Balanced Partition:Is there a partitionV = V1 ∪ V2, |V1| = |V2| such that

|u, v ∈ E | u ∈ V1, v ∈ V2|≤W




A≤pB (A is polynomial reducible to B):⇔ ∃f : Σ∗ → Γ∗ : ∀w ∈ Σ∗ : w ∈ A⇔ f(w) ∈ Bwith f computable in polynomial time.

Given Max-Cut instance G = (V,E), W > 0

→ define a Balanced Partition instance:

V ′ := V ∪ u1, . . . , u|V |E′ := u, v | u, v ∈ V ′ and u, v 6∈ EW ′ := n2 −W

We now show that the transformation is sufficient!

Max-Cut ≤p Balanced Partitionw ∈ Max-Cut ⇒ f(w) ∈ Balanced Partition



Given a graph G = (V,E) and a number W > 0,assume is there a cut (S, V \S) such that

|u, v ∈ E | u ∈ S, v ∈ V \S| ≥WW > 0⇒ S 6= ∅, V \S 6= ∅Idea: use u∗ for balancingLet j = n− |S|.Define V1 := S ∪ u1, . . . , uj, V2 := V ′\V1 → |V1| = |V2| = n

Cut-Size:

|u, v ∈ E′ | u ∈ V1, v ∈ V2|= n2−|u, v 6∈ E′ | u ∈ V1, v ∈ V2|= n2−|u, v ∈ E | u ∈ S, v ∈ V \S|

≤ n2 −W = W ′

Max-Cut ≤p Balanced Partitionw ∈ Max-Cut⇐f(w) ∈ Balanced Partition



Given a Balanced Partition V ′ = V1 ∪ V2, |V1| = |V2| = nsuch that

|u, v ∈ E′ | u ∈ V1, v ∈ V2| ≤ n2 −W = W ′

Define S := V1 ∩ V (remove u∗ vertices)

Cut-Size:

|u, v ∈ E | u ∈ S, v ∈ V \S|=|u, v 6∈ E′ | u ∈ V1, v ∈ V2|=n2 − |u, v ∈ E′ | u ∈ V1, v ∈ V2|≥n2 − (n2 −W ) = W

Optimization Problems



DefinitionA minimization problem is defined via a pair (L, f).Here, L is the set of feasible solutions andf : L → R is the cost function.

Definitionx∗ ∈ L is an optimal solution iff f(x∗) ≤ f(x) ∀x ∈ L

DefinitionA minimization algorithm achieves approximation ratio ρif for all inputs I, it produces a solution x(I) such that

f(x(I)) ≤ ρf(x∗(I))

where x∗(I) denotes the optimum solution for input I.

More Hardness Results



Theorem (Andreev, Räcke)For k ≥ 3 the perfectly balanced partitioning problem has no polynomialtime approximation algorithm with finite approximation factor unlessP = NP .

Beweis.Reduction from 3-Partition.

Reduction from 3-Partition



DefinitionGiven n = 3k integers a1, a2, . . . , an and a threshold S such thatS4 < ai <

S2 and

n∑i=1

ai = kS.

Decide if the numbers can be partitioned into triples such that eachtriple adds up to S.

strongly NP-complete [Garey, Johnson’79]:⇔ still NP-complete if numbers bounded by polynomial in n

Reduction from 3-PartitionCont.



Given an instance of 3-partition (with poly. bounded integers).Construct graph G: for each ai insert clique of size ai

......

a1 a2 a3 an

→Graph size poly. bounded since integers are poly. bounded

With a finite factor approx. algo for k-balanced partition (ε = 0)we could do the following:

if 3-Partition solvable, k-balanced partition solvable with cut zerootherwise: optimum partition will cut at least one edge

Conductance ClusteringNP-completeness [Sima, Schaeffer 2008]



Blackboard

Linear Programming



A linear program with n variables and m constraints is defined throughthe following minimization/maximization problem:

Linear cost function f(x) = c · x, where c denotes cost vectorm constraints ai · x ./i bi s.t. ./i∈ ≤,≥,=, ai ∈ Rn

We get

L = x ∈ Rn : ∀j ∈ 1..n : xj ≥ 0 ∧ ∀i ∈ 1..m : ai · x ./i bi .

ExampleShortest Paths



maximize∑v∈V

dv

s.t. ds = 0

dw ≤ dv + c(v, w) for all (v, w) ∈ E

Algorithms and Implementation



LPs solvable in polynomial time [Khachiyan 1979]

Worst case O(

max(m,n)72

)In practice much fasterRobust, efficient implementations are very complex

free and commercial software packages

Integer Linear Programming



ILP: Integer Linear Program, linear program with additionalconstraint: xi ∈ Z.often: 0/1 ILP s.t. xi ∈ 0, 1

MILP: Mixed Integer Linear Program, linear program with someinteger variables.

Linear Relaxation:Remove integer constraints of a (M)ILP

Optimization: ILP ApproachClustering



1. introduce decision variables

∀u, v ∈

(V

2

): Xuv =

0 if C(u) = C(v)

1 otherwise

2. ensure valid clustering with constraints(transitivity):

∀u, v, w ∈

(V

3

):

Xuv +Xvw −Xuw ≥ 0

Xuv +Xuw −Xvw ≥ 0

Xuw +Xvw −Xuv ≥ 0

3. reflexivity and symmetry for free

Optimization: ILP ApproachClustering



4. optimize target function, e.g., modularity:

modILP(G, CG) =∑

u,v∈(V2)

(1u,v∈E −

deg(u) · deg(v)

2 ·m

)· (1−Xuv)

Countless other constraints and objectives possible, e.g.,:intra-/inter-expansion as constraint of objectivesmulticriteria objective functions. . .

Example runtimes:modularity, 300 vertices, 1 day

[Görke: An algorithmic walk from static to dynamic graph clustering, 2010][Schumm et al.: Density-constrained graph clustering (technical report), 2011]

Optimization: ILP ApproachBalanced Bipartitions [Brillout]




∀e = u, v ∈ E : euv :=

1 if edge is cut edge0 otherwise

∀v ∈ V : xv :=

0 if v in block 01 if v in block 1

2. ensure valid partition constraints:

∀u, v ∈ E : euv ≥ |xu − xv| (1)

∑v∈V

xvc(v) ≤ U (2)

∑v∈V

xvc(v) ≥ L (3)

Optimization: ILP ApproachBalanced Bipartitions [Brillout]



3. objective:min

∑u,v∈E

euvω(u, v)

|.| not a linear constraint!replace with:

euv ≥ xu − xveuv ≥ xv − xu

if u, v in same block, rhs’s are 0otherwise: one rhs is 1 and the other one is -1

Optimization: ILP ApproachExtension to Balanced k-Partitions [Unpub]




∀e = u, v ∈ E : euv :=

1 if edge is cut edge0 otherwise

∀v ∈ V and k : xv,k := 1 iff v in block k

2. ensure valid k-partition constraints:∀u, v ∈ E ∀k : euv ≥ |xu,k − xv,k| (1)

∀k :∑v∈V

xv,kc(v) ≤ U (2)

∀k :∑v∈V

xv,kc(v) ≥ L (3)

∀v ∈ V :∑k

xv,k = 1 (4)

Optimization: ILP ApproachExtension to Balanced k-Partitions [Unpub]



3. objective (as before):

min∑

u,v∈Eeuvω(u, v)

Valid Constraints:∀v :

∑k xv,k = 1: each vertex is assigned to one block

u and v in same block⇒ |xu,k − xv,k| = 0 ∀k⇒ eu,v = 0, since edge weights positiveotherwise: ∃k : |xu,k − xv,k| ≥ 1⇒ eu,v = 1

eliminate |.| as beforeother objectives, more constraints etc. possibletoo slow in practice for larger values of k

Spectral Graph Partitioning



What are the connections of eigenvectors/values and cuts in graphs?

Lx = λx?←→ (S, V \S)

We learn:Properties of Laplacian L of GWhy Laplacian?How do we compute cuts using spectral techniquesHow do we compute eigenvectors and -values quickly

Main Techniques



Blackboard

ExamplesDegree matrix, etc...



2 0 0 0 0 0

0 3 0 0 0 0

0 0 2 0 0 0

0 0 0 3 0 0

0 0 0 0 3 0

0 0 0 0 0 1

0 1 0 0 1 0

1 0 1 0 1 0

0 1 0 1 0 0

0 0 1 0 1 1

1 1 0 1 0 0

0 0 0 1 0 0

2 −1 0 0 −1 0

−1 3 −1 0 −1 0

0 −1 2 −1 0 0

0 0 −1 3 −1 −1−1 −1 0 −1 3 0

0 0 0 −1 0 1

Degree matrix Adj. Matrix Laplacian Matrix

Theorems



TheoremThe Laplace matrix is symmetric iff G is undirected.

TheoremThe Laplace matrix is positive semi-definite.

TheoremThe multiplicity of the eigenvalue 0 in the spectrum of the Laplacematrix is equal to the number of connected components of the graph.

ExamplesEigenvector to Eigenvalue 0



2 −1 0 0 −1 0

−1 3 −1 0 −1 0

0 −1 2 −1 0 0

0 0 −1 3 −1 −1−1 −1 0 −1 3 0

0 0 0 −1 0 1

·

1

1

1

1

1

1

=

0

0

0

0

0

0

Lagrange Multipliers



Given a constrained optimization problem

minx∈Ω

f(x) subject to h(x) = 0

the Lagrangian is defined as

L(x, λ) = f(x) + λh(x).

Then x∗ is a local minimum⇔ there exists a unique λ∗ s.t.

∇xL(x∗, λ∗) = 0

∇λL(x∗, λ∗) = 0

yT (∇2xxL(x∗, λ∗))y ≥ 0 ∀y s.t. ∇xh(x∗)T y = 0

Spectral PartitioningRecipe



Algorithm:1. build Laplace matrix L (sparse representation!)2. compute eigenvector x of the smallest non-trivial eigenvalue3. compute median of xi4. compute (V1, V \V1) via splitting

Lx = λx↔ (V1, V \V1)

Example



λ2 = 2.22 and corresponding eigenvector x

1

1

7

3

10

5 2

1

4

1

2 6

3

54

x =

0.280.190.080.11−0.90

0.24

Example



Median = 0.15

1

1

7

3

10

5 2

1

4

1

2 6

3

54

x =

0.280.190.080.11−0.90

0.24

Spectral PartitioningRecipe



Algorithm:1. build Laplace matrix L (sparse representation!)2. compute eigenvector x of the smallest non-trivial eigenvalue3. compute median of xi4. compute (V1, V \V1) via splitting

Lx = λx↔ (V1, V \V1)

How can we do this fast?

Spectral PartitioningHow can we do this fast? [Barnard, Simon’93]



Multilevel Spectral Bisection: (rough outline)Contraction: create series of smaller graphsInterpolation: transfer eigenvector of coarse level to finer levelRefinement: given approximate eigenvalue, make it more accurate

on coarsest level:Lanczos iterationrefinement:Rayleigh quotient iterationtakes advantage of initial vector

History:importance of spectral partitioning→ ML for eigenvectors→ ML for GP

Spectral Modularity Maximization



Rewrite Modularity as:

mod(C) = 1

2m

∑i,j

in samecluster

(Ai,j −

d(i)d(j)

2m

)

A adjacency matrixδ(i, j) = 1⇔ i in same cluster as j

Spectral Modularity Maximization



Rewrite Modularity as:

mod(C) = 1

2m

∑i,j

(Ai,j −

d(i)d(j)

2m

)δ(i, j)

A adjacency matrixδ(i, j) = 1⇔ i in same cluster as j

Spectral Modularity MaximizationConsider the case of only 2 clusters



Let si be 1 if node i is in cluster 1; -1 if node i is in cluster 2

⇒ δ(i, j) =sisj + 1

2

mod(C) =1

4m

∑i,j

(Ai,j −

d(i)d(j)

2m

)(sisj + 1)

=1

4m

∑i,j

(Ai,j −

d(i)d(j)

2m

)sisj

=1

4m

∑i,j

Bi,jsisj =1

4msTBs

Method: relax integer constraint for s with real numbers and sT s = n

Spectral Modularity MaximizationConsider the case of only 2 clusters



Lagrangian (as before):

L(x, λ) = sTBs+ λ(n− sT s)

Optimality Conditions yield

Bs = λs

λ will correspond to the largest eigenvalue (why?)

Practice:

choose clustering s close to s, i.e. si = 1 if si > 0, si = −1 otherwise

Lessons LearnedSo far ...



What is GP, GC?Objective functions and principlesApplicationsWhy are GP, GC hard?Exact Solutions: ILPsSpectral Techniques for GP and GC

Lecture Overview



Fundamentals, Problem Definitions and Objective FunctionsLots of ApplicationsNP-Hardness of GP and GCExact Partitioning/ClusteringSpectral Partitioning/ClusteringLocal Search, Multilevel AlgorithmsParallel, External- und Semi-ExternalGreedy Agglomeration / Top-Down ApproachesMin-Cut Tree ClusteringEvolutionary Algorithms and Meta-HeuristicsDynamic Clustering, Online Algorithms

Multi-Level Graph Partitioning



Successful in existing systems:Metis, Scotch, Jostle, DiBaP. . . , KaPPa, KaHIP



Multilevel Graph Partitioning1. Local Search2. Contraction3. Global Search4. Other Approaches to MGP

Local SearchThink Globally, Act Locally



find some feasible solution x ∈ Lx :=x –– x is best solution found so farwhile not satisfied with x do

x :=some heuristically chosen element from N (x) ∩ Lif f(x) < f(x) then x :=x

Hill Climbing



find some feasible solution x ∈ Lx := x –– best solution found so farLoop

if ∃x ∈ N (x) ∩ L : f(x) < f(x) then x := xelse return x –– local optimum found

Problem: Local Optima



-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-50 0 50

sin(x)/x

Why the Neighborhood is important



0 1 2 3 4 5x 0 1 2 3 4 5

y

0 1 2 3 4 5 6

f

Kernighan-Lin AlgorithmMain Ideas



Given an initial bisection, we want improve it.Kernighan and Lin’s main idea:

B∗

A∗V1

V2

Let (V ∗1 , V \ =: V ∗2 ) be an optimal partition and (V1, V2) be any partition.Then there are sets A∗, B∗ ⊆ V such that

V ∗1 = V1 −A∗ +B∗

V ∗2 = V2 +A∗ −B∗

Kernighan-Lin AlgorithmMain Ideas



Given an initial bisection, we want improve it.Kernighan and Lin’s main idea:

B∗

A∗V1

V2

identifying A∗, B∗ still NP-hard, why?KL algorithm tries to find “good” approximations for A∗, B∗

Kernighan-Lin AlgorithmFind “good” approximations for A∗, B∗



Let (V1, V2) be any given bipartition.For a vertex v ∈ V define a gain g(v) as

g(v) = dext(v)− dint(v)

wheredext #neighbors in opposite blockdint #neighbors in same block

V1 V2

Gain of node exchange:

g(v1, v2) =

g(v1) + g(v2)− 2ω(v1, v2) if v1, v2 ∈ Eg(v1) + g(v2) otherwise.

“cut decrease if pair (v1, v2) is exchanged”

Kernighan-Lin AlgorithmFind “good” approximations for A∗, B∗



find candidate list L:find best unmarked pair (v1, v2) ∈ V1 × V2 subject to g(v1, v2)

“move” vertices, update gain values of neighborsadd pair to list Lmark v1 and v2

→ until no unmarked vertices left

construct A∗, B∗:have ordered list L of pairsfind ` such that

∑ì=1 g(vi, wi) =: φ(`) is maximum

define A∗ := ∪ì=1vi and B∗ := ∪ì=1wi.exchange A∗ and B∗

Key feature: ability to climb out of local minima to some extent

Kernighan-Lin Algorithm

Algorithm 1 Kernighan-Lin Local SearchData : G = (V,E), initial bisection V1, V2forall the v ∈ V do compute gain g(v);repeat

ordered list L← ∅unmark all nodes v ∈ Vfor i = 1 to n = min(|V1|, |V2|) do

(v1, v2)← argmaxunmarked v1∈V1,v2∈V2g(v1, v2)

update g-values for all v ∈ N(v1) ∪N(v2)

append (v1, v2) to L and mark v1, v2j ← argmaxkφ(k)γ ← φ(j)

if γ > 0 then exchange the first j node pairs;until γ ≤ 0;



Analysis:Running time O(n3) per round (obvious?)Sorting gain values→ O(n2 log n) [Kernighan, Lin’69]

Fiduccia-Mattheyses Local Search



KL algorithm too slow in practice[Fiduccia and Mattheyses’82] achieve O(m) running time by

1. selecting vertices independently for exchange2. using clever data-structures

again: perform passes where each node is moved at most onceselect the best feasible partition afterwards



Algorithm 2 Fiduccia-Mattheyses Local SearchData : G = (V,E), initial bisection V1, V2forall the v ∈ V do compute gain g(v);repeat

forall the v ∈ V1 do ;;put g(v) into bucket queue Q1

forall the v ∈ V2 do ;;put g(v) into bucket queue Q2

ordered list L← ∅unmark all nodes v ∈ Vwhile ∃ unmarked node do

alternatively select queue Q ∈ Q1,Q2v ← maxgain node in Qmove v to other block, append it to L, mark vremove v from Qforall the w ∈ N(v) do

update g-value

j ← argmax`φ(`) :=∑`

i g(vi) + g(wi)

γ ← φ(j), if γ > 0 then apply changes;until γ ≤ 0;

FM Local SearchExample



compute gain: g(v) = dext(v)− dint(v)

alternate between blocks – move nodes greedyedge-cut: 7




1 −1

−1

1

0

0

1

−1

−2

−1






−1

−2

1 −1

−1

1

0

0

1

−1






−1

−3

1 −1

−1

0

0−1

−2

recalculate gain g(v) of neighborsmove each node at most onceedge-cut: 7, 6




−1

−3

0

0−1

−2

−3

−3

repeat until stop criterion reachedallow worse cuts in betweenedge-cut: 7, 6, 5




−1

−3

−3

0

−2

−3

−3

repeat until stop criterion reachedallow worse cuts in betweenedge-cut: 7, 6, 5, 5




−3

−3

−3

0

−2

−3

repeat until stop criterion reachedallow worse cuts in betweenedge-cut: 7, 6, 5, 5, 6




steps

edgecut

allow worse cuts in betweenundo changes to best feasible solution




final partition with edge-cut 5linear-time implementation→ next

Bucket Priority Queue... used in the FM-Algorithm



Observation: ∀v ∈ V : g(v) ∈ [−∆, . . . ,∆]

∆

pmax

−∆

|V |

AnalysisOperations



Insert(v):find bucket, append nodestore pointer to elementmaybe increase pmax

Remove(v):get pointer to elementremove elementupdate pmax

UpdateKey(v):Remove(v); Insert(v)

DeleteMax(v):use pmax, return a max elementremove(v)

∆

pmax

−∆

|V |

AnalysisOperations



v ← maxgain node in Qmove vremove v from Qforall the w ∈ N(v) do

update g-value

Let v1, . . . , v, w, . . . vn be the order inwhich nodes are moved.

deleteMax(v)→ O(d(v) + d(w))

tight iff g(v) = d(v), g(w) = −d(w)amortization with update of neighborsoverall time O(m)

∆

pmax

−∆

|V |

Note: gain updated by keeping dext, dint up to date

Optimizations



Block selection strategies→ Quality1. TopGain: Select the block yielding the larger gain2. MaxBlock: Select the larger block

Move only boundary nodes→ SpeedRepeated runs with different random seeds→ QualityLocalization of local search→ Quality

Extensionsk-partitioning



due to lack of global knowledge recursive bisection can be far awayfrom the optimal partition [Simon, Teng’97]→ there is a need for k-way local searchk-way local search [Sanchis’89, Hendrickson, Leland’95]

1. k(k − 1) PQs, one for each type of move2. find Q maximizing the gain3. perform movement that preserves or impr. balancek-way linear time implementation [Karypis, Kumar’98]

1. one PQ for all types of moves2. use maxP gP (v) as key

pair-wise local search between adjacent block pairs?

Tabu searchMore expensive local search



General Framework:

find some feasible solution x ∈ Lx :=x –– x is best solution found so farT ← ∅while not satisfied with x do

x :=argminx6∈TN (x) ∩ Ladd x to T for some iterationsif f(x) < f(x) then x :=x

Graph Partitioning:a node moved more than once; moving back is tabu for x iterationsif v in block A, then the move (v,A) tabu for f(i) iterations

Local SearchDisadvantage



Unlikely:Improvements that need more than ≥ 2 negative gain moves

+++

Schnitt

#Schritte

More Localized Local Search



Typical: k-way local search initialized with complete boundaryLocalization:

1. complete boundary⇒ maintained todo list T2. initialize search with single node v ∈rnd T3. iterate until T = ∅

each node moved at most once



Perfectly Balanced Case

Classical Local SearchMore Blocks – ε = 0



each k-way move makes partition infeasible→ local search restricted to pairs of blocks

Perfectly Balanced CaseBasic Idea – Cycles



every block emits and obtains a nodesdevelop a model: negative cycle↔ improvement

Basic IdeaCycles



develop a model: negative cycle↔ improvementselect candidates for each directed edge

Basic IdeaUse Negative Cycles



A

B

C

A B

C0

0

−1−1

0

−1

B

C

A

Steps:1. select candidates (w.r.t to local gain)2. build model such that neg. cycle→ improvement3. detect negative cycle4. perform node movements

Augmentation: zero gain diversification (zero weight cycle detection)

Building the Model



Local Gain: gA,B(v) reduction in cut when v ∈ A is moved to block BBuilding the Model:

unmark all v ∈ Vcompute directed quotient graph Qfor each edge (A,B) in Q in random order:

find an unmarked vertex v in block A maximizing gain gA,B

mark v and all its neighborsset the weight of (A,B) to −gA,B(v)

A

B

C

A B

C0

0

−1−1

0

−1

B

C

A

Notes



Remark 1:let C be any cycle in Q. Then moving the candiate vertices associatedwith each edge of the cycle reduces the cut by the weight of the cycle.

TheoremThere is a negative cycle in Q iff moving a subset of candidate verticeskeeps the balance of the partition and reduces the cut.

Remark 3:by construction the set of candidates is an independent set. If thiswould not be the case, remark 1 would not be true and our modelwould not work.

A

B

C

A B

C0

0

−1−1

0

−1

B

C

A

Finding Negative Cycles



1: procedure BellmanFord(s : NodeId) : NodeArray × NodeArray2: d = 〈∞, ...,∞〉 : NodeArray of R ∪ −∞,∞3: parent = 〈⊥, ...,⊥〉 : NodeArray of NodeId4: d[s] := 0; parent[s] := s5: for i := 1 to n− 1 do6: forall e ∈ E do relax(e)7: Find negative cycle8: . . .9: return

negative cycle is a directed cycle with weight < 0Question: ∃ negative cycle in G (and output one)

−2

−2

0

s

Finding Negative Cycles



General:additional node H + edges (H, v) for v ∈ V with weight 0

After execution of Bellman-Ford (in line 7):∀ negative cycles C:∃(u, v) ∈ C :d[u] + c(e) < d[v]

Bellman-Ford – Running time



Running time O(nm) very slow!There are variants of the algorithm with much better best case.

Here:size of model depends on parameter k (#blocks)→ model is fairly small

Zero-Weight Cycles



Given a directed graph G = (V,E) is there a zero-weight cycle?

NP-complete:

SubsetSum Problem:Given S = a1, . . . , an and W .Question: ∃I ⊆ 1, . . . , n :

∑i∈I ai = W

−20

s+2

SubsetSum≤pZeroWeightCycle:

0 0 0 0

a

v1 v2 · · · vn vn+1

u1 un

−Wa1 an

Zero-Weight CyclesThings are simpler without negative cycles!



Given a directed graph G = (V,E) that doesn’t contain negative cycles.Question: Is there a zero-weight cycle?

Algorithm:insert node H, edges (H, v)∀v ∈ Vcompute shortest-path tree starting at Hshortest-path distances Π : V → R

new edge weights:`(e = A,B) := ω(e) + Π(A)−Π(B)

Observations:`-weight equals ω-weight of cycle`(e) ≥ 0∀e ∈ E → evict edges with `(e) > 0

if ∃ SCC with more then one node→ ∃ zero-weight cycleoutput: perform random walk in the component to find cycle

Running Time



insert node H + edges O(n)

shortest-path tree O(nm)

new weights + evict edges O(n+m)

strongly connected components O(n+m) (DFS)random walk O(|SCC|)→ O(n)

→ overall running time O(nm)

Note: shortest-path algorithm must handle negative edge weights→ we cannot use Dijkstra’s algorithm

Basic IdeaUse Negative Cycles



A

B

C

A B

C0

0

−1−1

0

−1

B

C

A

Steps:1. build model2. detect negative cycle3. if non-contained→ find zero-weight cycle4. perform node movements

Until stopping criterion reached

General Structure



in practice we don’t have an input partitioncreate partition with ε > 0 inbalance (reuse local search)apply balancing algorithms and neg. cycle local search

input graph

match

... ...local improvement

partitioning

initialuncontractcontract

partitioninbalanced

partition

balanced

A B

C0

0

−1−1

0

−1

s

1Layer: 3

A A

B

CC

B

2

A

B

C

BalancingBalance imbalanced partitions!



Local Search Before:s connected to all blocks in order to detect a cycle improvementrepeat whole process until negative cycles eliminated

BalancingBalance imbalanced partitions!



s t

Now:s connected to overloaded blocks onlyconnect underloaded blocks to tsearch for shortest s-t pathmany special cases, combineable with advanced model

Advanced ModelIdea



every block emits and obtains the same amount of nodesdevelop a model: negative cycle↔ improvement

Advanced Model



|A| = 14,|B| = 12,|C| = 14

A B

C

Goal: combine directed local searches (DLS)each DLS moves at most d nodesperform between all (directed) pairs of blocks∀ pairs (A,B) : ordered sequence of node movements S(A,B)

Advanced ModelFirst Approach



|A| = 14,|B| = 12,|C| = 14

s

1Layer: 3

A A

B

CC

B

2

A

B

C

d copies of quotient graph (one per layer)layer `: edge associated with first ` nodes to move S∗[1, . . . , `]weight: negative value of gain, when moving ` nodesnegative cycle within a layer maintains balance

Advanced Model



A B

C

s

1Layer: 3

A A

B

CC

B

2

A

B

C

after performing moves: |A| = 14, |B| = 13, |C| = 13

additional augmentations possibile→ encode pathsa block can emit more nodes than it receives and vice versamultiple DLS per block pair (pick best), and more

ExperimentsWalshaw Benchmark – Improving Existing Records



55% of perf. balanced records contained positive gain verticesuse record as input to refinement

k Basic +ZG Adv. +MDLS2 0% 0% 0% 0%4 18% 24% 41% 44%8 38% 50% 64% 74%16 64% 68% 71% 79%32 76% 76% 88% 91%64 82% 82% 79% 88%sum 47% 50% 57% 63%

Basic (Basic Neg. Cycle Impr.), +ZG (+Cycle Diversification),Adv. (Adv. Model + Cycle Div.), +MDLS. (Adv. + MDLS)

Maximum Flows



Literatur:

[Mehlhorn / Näher, The LEDA Platform of Combinatorial andGeometric Computing, Cambridge University Press, 1999][Ahuja, Magnanti, Orlin, Network Flows, Prentice Hall, 1993]Algorithmen II (Slides, Script) [Sanders]

Definitions: Network



Network = directed weighted graph withsource node s and sink node ts has no incoming edges, t has no outgoing edgesWeight ce of an edge e = capacity of e (nonnegative!)

10

10

12

10

4

8

4

4

s t

Definitions: Flows



Flow = function fe on the edges, 0 ≤ fe ≤ ce∀e∀v ∈ V \ s, t: total incoming flow = total outgoing flowValue of a flowval(f) = total outgoing flow from s = total flow going into tGoal: find a flow with maximum value

10

10

12

10

4

4

4

10

8

12

8

4

6

42

s t

8

Definitions: (Minimum) s-t Cuts



An s-t cut is partition of V into S and T with s ∈ S and t ∈ T .The capacity of this cut is:∑

c(u,v) : u ∈ S, v ∈ T

10

10

12

10

4

4

4

10

8

12

8

4

6

48

s

2

t

Duality Between Flows and Cuts



Theorem ([Elias/Feinstein/Shannon, Ford/Fulkerson 1956])Value of an s-t max-flow = minimum capacity of an s-t cut.Proof: Algorithms II

10

10

12

10

4

4

4

10

8

12

8

4

6

48

s

2

t

Algorithms 1956–now



Year Author Running time1956 Ford-Fulkerson O(mnU)1969 Edmonds-Karp O(m2n)1970 Dinic O(mn2)1973 Dinic-Gabow O(mn logU)1974 Karzanov O(n3)1977 Cherkassky O(n2

√m)

1980 Galil-Naamad O(mn log2 n)1983 Sleator-Tarjan O(mn log n)

n = number of nodesm = number of arcsU = largest capacity



Year Author Running time1986 Goldberg-Tarjan O(mn log(n2/m))1987 Ahuja-Orlin O(mn+ n2 logU)1987 Ahuja-Orlin-Tarjan O(mn log(2 + n

√logU/m))

1990 Cher.-Hage.-Mehl. O(n3/ log n)

1990 Alon O(mn+ n8/3 log n)1992 King-Rao-Tarjan O(mn+ n2+e)

1993 Philipps-Westbrook O(mn log n/ log mn + n2 log2+ε n)

1994 King-Rao-Tarjan O(mn log n/ log mn logn ) if m ≥ 2n log n

1997 Goldberg-Rao O(minm1/2, n2/3m log(n2/m) logU)

Applications



oil pipestraffic flows on highwaysimage processing http://vision.csd.uwo.ca/maxflow-data

segmentationstereo processingmultiview reconstructionsurface fitting

disk/machine/tanker schedulingmatrix rounding. . .graph partitioning!

1. local search2. contraction

http://vision.csd.uwo.ca/maxflow-data

Flows as Local ImprovementTwo Blocks



V1V2B

Gs t

∂1B ∂2B

area B, such that each (s, t)-min cut is ε-balanced cut in Ge.g. 2 times BFS (left, right)if size would exceed (1 + ε) c(V )

2 − c(V2), stop BFS

⇒ c(V2new) ≤ c(V2) + (1 + ε) c(V )2 − c(V2)

Flows as Local ImprovementTwo Blocks



s t

B

GV1

V2

obtain optimal cut in Bsince each cut in B yields a feasible partition→ improved two-partitionadvanced techniques possible and necessary

Example100x100 Grid



ExampleConstructed Flow Problem (using BFS)



ExampleApply Max-Flow Min-Cut



ExampleOutput Improved Partition



Flows as Local ImprovementAdaptive Search



search in larger areas for feasable cutsadaptively control the size of area B

if resulting cut not feasible→ shrink B, try againheuristic for most balanced minimum cuts [Picard et al. 1980]

V1

V2

B

G s t s t

B

GV1

V2

Most Balanced Minimum Cuts



Given a s-t flow network.Output: min cut (S, V \S) that maximizes min(|S|, |V \S|)

NP-complete:Partition Problem:Given a1, . . . , an.Question: ∃I ⊆ 1, . . . , n :

∑i∈I ai =

∑i 6∈I ai

.

.

.

a1

a2

a3

an

tsPartition≤pMostBalancedMinimumCuts:Note: all non-trivial cuts (S, V \S) cut n edges

Residual Graph



Given, network G = (V,E, c), flow fResidual graph Gf = (V,Ef , c

f ). For each e ∈ E we havee ∈ Ef with cfe = ce − f(e) if f(e) < c(e)

erev ∈ Ef with cferev = f(e) if f(e) > 0

10 12

10

4

8

4 10

10

4

44 264

6

10

1010

2

4

4

12

124

2

residual capacityflow

capacity

Most Balanced Minimum Cuts[Picard, Queyranne’80]



Definition (Closed Node Set)For a graph a subset C ⊆ V is a closed node set iff for all nodesu, v ∈ V , the conditions u ∈ C and (u, v) ∈ E imply v ∈ C.

a b

cd

e

f

g

h

TheoremA cut (S, V \S) separating s from t is a minimum cut if and only if S is acloset node set in the residual graph that contains s and not t.

Intuition: cut saturated, we cannot push more flow over the cut.Proof:→ blackboard

Most Balanced Minimum Cuts



Theorem (Cut-Theorem)Let (S, V \S) be a s-t-cut and f be a flow. Then

val(f) =∑

e=(i,j)∈Ei∈S

j∈V \S

f(e)−∑

e=(i,j)∈Ej∈Si∈V \S

f(e).

Moreover, val(f) ≤ c(S, V \S).

Theorem (Max-Flow Min-Cut Theorem)The value of a maximal flow equals the minimal capacity of s-t-cut.

Most Balanced Minimum CutsHeuristic



Observation: a cycle in Gf cannot contain a node of C and V \CAlgorithm:

compute SCCs of residual graph Gf in O(n+m)

contract SCCs→ node-weighted DAG O(n+m)

repeat: until stopping criterion reachedcompute random topological order of DAG O(n′ +m′)sweep reverse order to obtain closed nodes sets O(n′ +m′)

return best balanced cut found

closed node set sweep

aa

bb

c

c

d

d

ee f

f

gg h

h

Local Improvement for k-partitionsUsing Flows?



on each pair of blocks

input graph

...

initial

...

outputpartition

local improvement

partitioning

match

contract uncontract

Local Search or Flows?



Local Search Flowslocal global

multiway two wayany ε large ε

handicapped for ε ≈ 0

Combination works best

Flows to Improve Expansion



Given a bisection (A,B) of a graph G = (V,E).WLOG a = |A| ≤ |B|c = |E(A,B)|: number of edges crossing the cut

ψ(A, V \A) :=|E(A, V \A)|

min|A|, |V \A|

=c

a

Build Directed Flow Network:

Discard all B nodesDiscard every edge connecting pairs of B nodesReplace every edge in A by two directed edges (cap. a)Add source S, sink TConnect S to boundary vertices in A (cap a)Connect all nodes in A to T (cap c)

Flows to Improve ExpansionSetting up Max-Flow Problem and Interpreting its Solution



Theorem (Lang, Rao’04)There is an improved quotient cut (A′, B′) (with A′ ⊂ A) iff themaximum flow is less than ca

Expansion ImprovementExample



METISMETIS+MQI



Multilevel Graph Partitioning

1. Local Search2. Contraction3. Global Search4. Other Approaches to MGP




Successful in existing systems:Metis, Scotch, Jostle,. . . , KaPPa, KaHIP

Definitions



Definition (Matching)A matchingM⊆ E is a set of edges not sharing any end point.I.e. G = (V,M) has maximum degree one.

A matching is maximal iff no edge can be added to the matching.

The weight of a matching is∑

e∈M ω(e).

A maximum weight matching has the largest weight possible weight.

Matching-based Coarsening



A+B

a+b

B

a b

A

Matching-based Coarsening



A+B

a+ba b

A B

Key: balance and cut are equal on both sides!Note: moving coarse vertices→ moving sets of fine vertices.

Contract Matchings



A+B

a+b

B

a b

A

But how are matchings found?

Maximum Weight Matching



Theorem ([Gabow 1992])A maximum weighted matching can be found in time O

(nm+ n2 log n

).

Approximate Weighted Matching[Avis’83]



Greedy Algorithm:

MGreedy := ∅while E 6= ∅ do

take an edge v, w ∈ E with highest weight;add v, w toMGreedyremove all edges incident to v or w from E

Running time: O(m log n) (sorting edges)

Approximate Weighted Matching



Theorem ([Avis’83])ω(MGreedy) ≥ 1

2ω(MOpt) for non-negative ω

Proof:let ω(e) be the weight of the first edge e selected by Greedyafter removing e and incident edges:there are at most two edges of an optimal matching removed.the sum of their weights cannot exceed 2ω(e)

repeat the argument for the remaining iterations

→ ω(MOpt)

ω(MGreedy)≤ 2

Heavy Edge MatchingFast and Simple



HEM Algorithm:

MHEM := ∅for v ∈ V in random order do

take heaviest edge v, w ∈ E;add v, w toMHEMremove all edges incident to v or w from E

Running time: O(m) (obvious?)Quality: no performance guarantee!

100

12

Global Paths AlgorithmApprox. Weighted Matching [Maue Sanders 2007]



Sort edges according to their weight/ratingGrow a set of paths and even-length cyclesFind optimum matching for every path and cycle (dyn. programming)Running time O(m+ sort(m))

Global Paths AlgorithmApprox. Weighted Matching [Maue Sanders 2007]



GPA Algorithm:

MGPA := ∅E′ := ∅for each e ∈ E in descending order do

if e is applicable then add e to E′for each path or cycle P in E′ do

M ′ :=MaximumWeightMatching(P )MGPA =MGPA ∪M ′

An edge is applicable if:it connects two endpoints of different paths orit connects two endpoints of an odd length path

Approximate Weighted Matching



Theorem ([Maue, Sanders’06])ω(MGPA) ≥ 1

2ω(MOpt) for non-negative ω

Proof:→ homework

Theorem ([Maue, Sanders’06])The approximation ratio of 1

2 is tight.Proof:

ω(MOpt) = m2 c

ω(MGPA) = m4 (c+ ε) = 1

2ω(MOpt) + ε′

Preis’ Principle[Preis’99]



Include any locally maximal edge (no incident edge is heavier)Remove incident nodes and edgesRepeat 1/2-approximation for maximum weight matching

Local Max Algorithm



[Hoepman 2004, distributed (graph=machine)]Remove all locally maximal nodes in each iteration

Analysis



Worst case: neither fast nor efficient – Ω(n) Iterations

Now:maximal unweighted matching (artificial random edge weights)“almost” an average case analysis for max. weighted matching(analysis assumes independent edge weights in each iteration)

Analysis – Central Lemma



For random edge weights, the expected number of edges halves ineach iteration.

Consider one possible mark for each end-point of an edge;u v

mm

(u, v) marked at v side iff v gets matched.

Observation: (u, v) is removed iff it gets at least one mark.Matching (u, v) leads to d(u) + d(v) marks (d denotes node degree).Random variableX(u,v) := number of marks introduced by (u, v) (0 or d(u) + d(v))Total number of marks is X :=

∑e∈E Xe

Claim: E[X] ≥ m⇒≥ m/2 edges have a mark⇒≥ m/2 edges are removed in expectation

......

u v

d(u)−1 d(v)−1mm

mmm

mmm

Proving the Claim E[X] ≥ m



E[X] = E[∑e∈E

Xe] =∑e∈E

E[Xe]

=∑

(u,v)∈E(d(u) + d(v))P [(u, v) is locally maximal]

=∑

(u,v)∈E

d(u) + d(v)

d(u) + d(v)− 1

≥∑e∈E

1 = m

......

u v

d(u)−1 d(v)−1mm

mmm

mmm

Summary of Generic Analysis



logarithmic number of iterations fast parallelizationgeometrically decreasing work linear overall work

m

log m O(m)

Blueprint for Implementation



3 Passes Over Edges (u, v)

1. find locally maximal candidate edges C[v] for each node v2. if C[u] = C[v] put (u, v) into matching3. if u or v is matched remove (u, v)

Distributed Memory (MPI)



Distribute nodes over processors (assuming all degrees≤ m/p).1. find locally minimal candidate edges C[v] for each node v

local computation2. if C[u] = C[v] put (u, v) into matching

exchange candidate information3. if u or v is matched remove (u, v)

local computation

Additionally:Implementations for vertex centric modelse.g. Apache Giraph, Pregel, ....

Experiments



Greedy try to match edges starting from heaviest one(simple 1/2-approximation)

HEM [Karypis/Kumar METIS] heavy edge matching(fast sequential algorithm)

GPA [Maue Sanders 2007] global paths algorithm(high quality 1/2-approximation )

RBM [Fagginger Auer und Bisseling 2012] Red-Blue Matching.(parallel algorithm without guarantees (not here))

Quality – Sparse Matrix Instances



0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

219

220

221

222

223

weig

ht ra

tio to G

PA

edges

local maxGreedy

RBMHEM (random)

Sequential Time – Delaunay Instances



4

8

16

32

64

128

256

216 217 218 219 220 221 222

time

per

edge

[ns]

nodes

RBMGPA

Greedy

local maxHEM (random)HEM (original)

Conclusions



Local maxsimplescalablerobust high quality (do not trust simple heuristics like HEM)provable performance for maximal matching

Contract Matchings



A+B

a+b

B

a b

A

But how are the edges selected?

Matching SelectionGoals:



1. large edge weights sparsify2. large #edges few levels3. uniform node weights “represent” input4. small node degrees “represent” input

unclear objective gap to approx. weighted matchingwhich only considers 1.,2.

Our Solution:Apply approx. weighted matching to general edge rating function

input graph

...

contract

match

Graph PartitioningEdge Ratings



ω(u, v)

expansion(u, v) :=ω(u, v)c(u) + c(v)

expansion∗(u, v) :=ω(u, v)c(u)c(v)

expansion∗2(u, v) :=ω(u, v)2

c(u)c(v)

innerOuter(u, v) :=ω(u, v)

Out(v) + Out(u)− 2ω(u, v)

expansion∗2−adist(u, v) :=expansion∗2(u, v)

φu,v

where c = node weight, ω =edge weight, Out(u) :=∑u,v∈E ω(u, v)

Graph PartitioningEdge Ratings



Edge Rating avg. best. avg. bal. avg. texpansion∗2 2910 2819 1.025 1.29expansion∗ 2914 2815 1.025 1.30innerOuter 2914 2816 1.025 1.32expansion 2940 2841 1.025 1.31weight 3165 3010 1.026 1.40

Matching end to end GPA < HEM < GREEDY < RANDOMMATCHING.Local Max?

Evaluation: geometric mean of cuts over a set of instances/graphs.

Initial Partitioning



Usually done by recursive bipartitioning, e.g. using BFSor spectral partitioning or even ILPs?multiple tries pay offiterate until k reached

Open Problem:Direct k-partitioner that achieves better quality or speed.




Usually done by recursive bipartitioning, e.g. using BFS

find nodes that are “far” away from each other1. random start node, perform BFS, last node is new start node2. until distance converges

alternativly assign nodes to blocksrefine using local search




Successful in existing systems:Metis, Scotch, Jostle, DiBaP. . . , KaPPa, KaHIP

Iterated Multilevel [Walshaw 2004]



input graph


partitioning


...

uncontract

local improvement...

contract

don’t contract cut edgesadapt previous solution as initial partitioningcuts can only improveV-cycles / F-cycles

Global SearchV-Cycles



Co

arse

nin

gU

nco

arsenin

g

Graph not partitioned

Graph partitioned

Global SearchW-Cycles



Coar

senin

gU

nco

arsenin

g


Graph partitioned

Global SearchF-Cycles



Un

coarsen

ingC

oar

sen

ing


Graph partitioned



Experiments

ExperimentsTestset



Medium sized instancesgraph n m

rgg17 217 1 457 506rgg18 218 3 094 566Delaunay17 217 786 352Delaunay18 218 1 572 792bcsstk29 13 992 605 4964elt 15 606 91 756fesphere 16 386 98 304cti 16 840 96 464memplus 17 758 108 384cs4 33 499 87 716pwt 36 519 289 588bcsstk32 44 609 1 970 092body 45 087 327 468t60k 60 005 178 880wing 62 032 243 088brack2 62 631 733 118finan512 74 752 522 240bel 463 514 1 183 764nld 893 041 2 279 080af_shell9 504 855 17 084 020

Experimental EvaluationFlows



Var. (+F, -MB, -FM ) (+F, +MB, -FM) (+F, -MB, +FM) (+F, +MB, +FM)α′ ∆ cut % t[s] ∆ cut % t[s] ∆ cut % t[s] ∆ cut % t[s]16 −1.88 4.17 0.81 3.92 6.14 4.30 7.21 5.018 −2.30 2.11 0.41 2.07 5.99 2.41 7.06 2.724 −4.86 1.24 −2.20 1.29 5.27 1.62 6.21 1.762 −11.86 0.90 −9.16 0.96 3.66 1.31 4.17 1.391 −19.58 0.76 −17.09 0.80 1.64 1.19 1.74 1.22Ref. (-F, -MB, +FM) 2 974 1.13

final score of different configurationsα′ flow region upper bound factorall value are improvements rel. to Ref. +/- F ↔ +/- Flow

+/- MB ↔ +/- Most Bal. H.+/- FM ↔ +/- FM Algorithm

Experimental EvaluationFlows - Effectiveness



Effectiveness (+F, +MB, -FM) (+F,-MB, +FM) (+F,+MB,+FM)α′ ∆ cut % ∆ cut % ∆ cut %16 −1.29 3.70 4.288 −1.12 4.16 4.744 −3.05 4.04 4.632 −8.26 3.02 3.361 −16.41 1.62 1.65(-F, -MB, +FM) 2 833 2 831 2 827

each configuration has the same amount of timeall value are improvements rel. to Ref.

+/- F ↔ +/- Flow+/- MB ↔ +/- Most Bal. H.+/- FM ↔ +/- FM Algorithm

Experimental EvaluationGlobal Search



Algorithm ∆ cut % t[s] Eff. Avg.2 F-cycle 2.69 2.31 2 8063 V-cycle 2.69 2.49 2 8102 W-cycle 2.91 2.77 2 8101 W-cycle 1.33 1.38 2 8151 F-cycle 1.09 1.18 2 8162 V-cycle 1.88 1.67 2 8171 V-cycle 2 973 0.85 2 834

relative fast basic configurationonly global search is variedred significantly < bluered and blue significantly < reference

ExperimentsRemove Components



Remove components step by step

Algorithm Avg. t Eff. Avg.KaFFPa Strong 2 683 8.93 2 636-KWay 2 682 9.23 2 636-MoreLocalizedSearch 2 729 5.55 2 668-FCycle 2 748 3.27 2 669-Flow 2 934 1.66 2 799

Experimental ResultsComparison with Other Systems



Geometric mean, imbalance ε = 0.03:11 graphs (78K–18M nodes) ×k ∈ 2, 4, 8, 16, 64

Algorithm large graphsBest Avg. t[s]

KaFFPa strong 12 053 12 182 121.22KaSPar strong 12 450 +3% 87.12KaFFPa eco 12 763 +6% 3.82Scotch 14 218 +20% 3.55KaFFa fast 15 124 +24% 0.98kMetis 15 167 +33% 0.83

Repeating Scotch as long as KaSPar strong run and choosing thebest result 12.1% larger cutsWalshaw instances, road networks, Florida Sparse MatrixCollection, random Delaunay triangulations, random geometricgraphs

Other Approaches to MGP



diffusion based partitioningn-level graph partitioningcontraction by independent setscontract clusterings

Sequential Graph PartitioningKarlsruhe Sequential Partitioner



1. Contractioncontract a single edge between two levels

possibly n levelsfinegrained contraction consequtive levels are very similarno matching algorithm required

use of different edge ratings uniform distribution of node weightspriority queue defines the order of edges to be contracted

2. Local Searchefficent stopping criterion avoid quadratic runtime

3. GeneralTrial Trees improve quality by independent trials

KaSParLocal Search



Nodes unmarked, active, markedactive nodes

compute gains of moving from one block toanotherchoose the maximum gain

when movedactive markedcan’t become active anymoreunmarked neighbours of marked active

v

u

⇒

v

u

KaSParLocal Search - Stopping Criteria



touching each node on each level leadsto Ω(n2) runtime⇒ need flexible stopping criteriagains in each step identicallydistributed, independent randomvariables

expectation µvariance σ2

compute µ and σ2 from previous stepsstop after p steps if pµ2 > ασ2 + β

v

u

⇒

v

u

KaSParMemory Overhead



addressable PQ based on pairing heaps used for contractiondynamic graph data structure

1

2

3

4

5

104 105 106 107 108

mem

ory

ove

rhead

m




diffusion based partitioningn-level graph partitioningcontraction by dominating setscontract clusterings

Matching-based CoarseningProblem



bad for networks that are highly irregularsubstanial reduction is hard using matchingsmay contract wrong edges!

Algebraic DistanceRelaxation Process



Goal: distinguish local and non-local edges20

40∀ i ∈ V : xi := rand()

Do k times∀ i ∈ V : xki := (1− α)xk−1

i + α∑

j ωijxk−1j /

∑ij ωij

ConjectureIf |xki − xkj | > |xku − xkv | then the local connectivity between u and v isstronger than that between i and j.

Algebraic DistanceExample



20

40

goal: distinguish local and non-local edges

Algebraic DistanceExample - random initialization



assign random valuesrelaxation steps to equalize

Algebraic DistanceExample - 10 relaxation steps



relaxation steps to equalizelocal differences equalize faster

Stationary iterative relaxation



rewrite process as x(k+1) = HJORx(k), where

HJOR = (1− α) I + αD−1W

⇒ basically the JOR method to solve Lx = 0

L = D −W graph laplacian, α relaxation paramteralgebraic distance↔ JOR smoothing property





HJOR = (1− α) I + αD−1W


L = D −W graph laplacian, α relaxation paramterHJOR lazy random walk matrix





HJOR = (1− α) I + αD−1W


DefinitionExtended 2-normed algebraic distance between i and j after kiterations x(k+1) = Hx(k) on R random initializations

ρ(k)ij :=

(R∑r=1

|x(k,r)i − x(k,r)

j |2)1/2

if i, j ∈ E, otherwise ρ(k)ij = 0

Weighted Aggregationinspired by AMG



each node may be divided into fractionsdifferent fractions form a coarse node

1. how to select seed nodes2. how is classical likelihood expressed3. modification for graph partitioning

contract

input graph

...

Coarse NodesC-points Selection



dominating set C ⊂ V : ∀v ∈ V \C : N(v) ∩ C 6= ∅find dominating set C ⊂ V s.t. F = V \ C “strongly coupled” to Ceach node in C is a seed of a coarse aggregate

SEEDS

Coarse NodesC-points Selection



find dominating set C ⊂ V s.t. F = V \ C “strongly coupled” to Ceach node in C is a seed of a coarse aggregate

C = ∅, F = V

until∀u ∈ F :

∑v∈C∩N(u) 1/ρu,v ≥ θ

∑v∈N(u) 1/ρu,v

transfer node from F to CSEEDS

θ ∈ (0, 1), usually θ = 0.5

TheoremIf the constraint is fulfilled then C is a dominating set.

Coarse Nodesclassical AMG interpolation matrix P



express likelihood of v belonging to c ∈ C → P ∈ R|V |x|C|

Belong to several aggregates

Pv,c =

1/ρv,c( ∑

k∈N(v)∩C1/ρv,k

) v ∈ F, c ∈ N(v) ∩ C

1 v ∈ C, v = c0 otherwise

Coarse Nodesclassical AMG interpolation matrix P





Lc = P tLfP

volume of c(q ∈ C) :=∑

v c(v)Pv,q (sum of weights constant)wc1,c2 =

∑k 6=l Pk,c1wk,lPl,c2

Coarse Nodesmodified AMG interpolation matrix P





previous (matching based) work:

vertices should not become too heavycoarse levels should stay sparse→ modified algorithm to compute likelihood P

Modified AlgorithmLikelihood for v ∈ V



restrict to at most two strongest connections→ sparsityavoid overloaded vertices




if v ∈ C then Pv,v = 1






elsefind max pair (1/ρe1 + 1/ρe2)s.t. coarse vertices not

overloaded whensplitting v






else...if no such pair then

find max edge (1/ρe1)s.t. coarse vertices not

overloaded whenaggregating v






else......if no such edge then

move v to C





N cv ← selected C-nodes

Pv,c ← 1/ρvc∑k∈Nc

v1/ρvk

, c ∈ N cv


ExperimentsAlgorithm Configurations



ECO KaFFPaEco, a good trade-off of quality and runtimeECO-ALG matching based coarsening, refinement as in ECO, ra-

ting function ex_alg(e) := expansion∗2(e)/ρeSTRONG matching based coarsening, strong refinement schemeAMG-ECO AMG coarsening based, refinement as in ECOAMG AMG coarsening based, refinement as in STRONG

expansion∗2(u, v) := ω(u,v)2c(u)c(v)

Experimental ResultsSocial Networks - Algebraic Distance Helps!



ECOECO-ALG

ECOECO-ALG

k quality full time2 1.38 0.774 1.24 1.328 1.15 1.2916 1.09 1.2732 1.06 1.1864 1.06 1.13

Experimental ResultsSocial Networks - AMG helps!



ECO-ALGAMG-ECO

ECO-ALGAMG-ECO

k quality uncoar. time2 1.16 3.624 1.11 2.148 1.07 1.9416 1.06 1.6932 1.00 1.6064 1.01 2.99

Experimental ResultsPotentially Hard Graphs



potentially hard mixtures (star-like structures)mixtures of snw, fem graphs, VLSI, ...S0, . . . , St connected center S0 by ≤ 3% random edges

ECOECO-ALG

ECOECO-ALG

k quality full time2 1.42 0.514 1.15 0.888 1.12 1.08

Experimental ResultsPotentially Hard Graphs



potentially hard mixtures (star-like structures)mixtures of snw, fem graphs, VLSI, ...S0, . . . , St connected center S0 by ≤ 3% random edges

ECO-ALGAMG-ECO

ECO-ALGAMG-ECO

STRONGAMG

STRONGAMG

k quality uncoar. time quality uncoar. time2 1.18 0.55 1.15 2.114 1.23 0.64 1.13 1.698 1.08 0.98 1.05 1.37




diffusion based partitioningn-level graph partitioningcontraction by dominating setscontract clusterings

Contraction of Clusterings



aggressive contraction / simple and fast local searchmain idea: contract clusteringsclustering paradigm: internally dense and externally sparse

→




A+B+C

a+b+ca bc

A

B

C

contraction: respect balance and cutavoid large blocks: size constraint Uconstruct coarse graph using hashingrecurse until graph is small

Label PropagationCut-based, Linear Time Clustering Algorithm



cut-based clustering using size-constraint label propagationstart with singletonstraverse nodes in random order or smallest degree firstmove node to cluster having strongest eligible connectioneligible: w.r.t size constraint U

Scan

... ...

→

Label Propagation



Iteration Cut [%]0 1001 8.962 6.153 5.664 5.445 5.286 5.257 5.218 5.18... 5.09

Label PropagationSimple Local Search



Greedy Local Search:start with partition from coarser leveltraverse nodes in random ordermove node to cluster having strongest eligible connectioneligible: w.r.t size constraint U := (1 + ε) |V |

k

Scan

... ...

→

Algorithmic AugmentationsLabel Propagation: Active Nodes Approach



speed up label propagation in later roundsassume absence of size-constraint U

Key observation:node can only change cluster if ≥ 1 neighbor changed cluster

node active: ≥ 1 neighbor changed cluster in previous roundonly consider active nodes

Algorithmic AugmentationsEnsemble Clusterings



combine multiple clusteringsbase clusterings: which nodes should belong to same clustercombination intuition:

if nodes in same block in all clusterings→ keep in same blockelse: clusterings do not agree→ split

⇒

Algorithmic AugmentationsEnsemble Clusterings





C1

C2

O

construction: linear scan + hashing

Algorithmic AugmentationsIterated Multilevel [Walshaw 2004]



input graph


partitioning


...

uncontract

local improvement...

contract

don’t contract cut edgesadapt previous solution as initial partitioningcluster coarsening scheme:

modify label propagation to respect partition

Algorithmic AugmentationsAllowing More Imbalance On Coarse Levels



more imbalance: local search has more freedom

input graph


partitioning


Relaxed constraint:

Decrease slowly

Constraint:

(1 + ǫ)|V |k

ǫ = 0

ǫ

(1 + ǫ + ǫ)|V |k

Instances



graph n m graph n m

Large Graphsp2p-Gnutella04 6 405 29 215 citationCiteseer 268 495 ≈1.2Mwordassociation-2011 10 617 63 788 coAuthorsDBLP 299 067 977 676PGPgiantcompo 10 680 24 316 cnr-2000 325 557 ≈2.7Memail-EuAll 16 805 60 260 web-Google 356 648 ≈2.1Mas-22july06 22 963 48 436 coPapersCiteseer 434 102 ≈16.0Msoc-Slashdot0902 28 550 379 445 coPapersDBLP 540 486 ≈15.2Mloc-brightkite 56 739 212 945 as-skitter 554 930 ≈5.8Menron 69 244 254 449 amazon-2008 735 323 ≈3.5Mloc-gowalla 196 591 950 327 eu-2005 862 664 ≈16.1McoAuthorsCiteseer 227 320 814 134 in-2004 ≈1.3M ≈13.6Mwiki-Talk 232 314 ≈1.5M

Huge Graphsuk-2002 ≈18.5M ≈262M sk-2005 ≈50.6M ≈1.8Garabic-2005 ≈22.7M ≈553M uk-2007 ≈106M ≈3.3G

Machine A: 2x Intel Xeon E5-2670, 2.6GHz, 64GB RAMMachine B: 4x Intel Xeon E5-4640, 2.4GHz, 1TB RAMε = 3%, k ∈ 2, 4, 8, 16, 32, 64, geom. mean of averages, 10 seeds

Experimental ResultsLarge Graphs - Eco Local Search



Algorithm improvement t [s]CEcoR 19.6% 10.2CEco 27.8% 8.6CEcoV 30.1% 14.3CEcoV/B 33.0% 15.5CEcoV/B/E 32.7% 46.6CEcoV/B/E/A 32.1% 41.9KaFFPaEco 85 920.07 36.2

Abbreviations

CEco Eco with new coarseningsame local search

V V-cyclesB add. imbalanceE ensemble clusteringsA active nodesR random node ordering

Algorithmic component

Cluster coarsening improves speed and qualityDegree node ordering improves speed and qualityV-cycles improves qualityAdd. imbalance improves qualityEnsembles sometimes better, sometimes worseActive nodes improves speed

Experimental ResultsLarge Graphs - LP Local Search



Algorithm improvement t [s]CFastR 15.5% 4.7CFast 24.8% 3.9CFastV 27.1% 5.7CFastV/B 21.8% 5.8CFastV/B/E 24.6% 28.4CFastV/B/E/A 24.6% 24.4KaFFPaEco 85 920.07 36.2

Abbreviations

CFast CEcowith LP local search


Algorithmic component

Cluster coarsening improves speed and qualityDegree node ordering improves speed and qualityV-cycles improves qualityAdd. imbalance improves qualityEnsembles improves qualityActive nodes improves speed

Experimental Results



Algorithm impr. cut t [s]

CFast 52.4% 3.9CFastV 55.3% 5.7CEcoV/B 62.5% 15.5UFast 51.7% 1.5UFastV 54.7% 3.0UEcoV/B 60.9% 11.5UStrong 75.1% 296.4

KaFFPaEco 22.2% 36.2KaFFPaStrong 66.2% 640.8kMetis 45.8% 0.4hMetis 60.5% 107.4Scotch 104 954.86 10.6

Abbreviations

U use new coarseningalso in initial part.


UEcoV/B outperforms hMetis, order of magnitute less timeUStrong best quality, running time ≤ KaFFPaStrongbest cut of hMetis 6% worse than avg. cut of UStrong

Huge GraphsPerformance on huge networks for k = 16 blocks



graph arabic-2005 uk-2002algorithm avg. cut best cut t [s] avg. cut best cut avg. t [s]UFast 1.91M 1.87M 111.2 1.47M 1.43M 71.7UFastV 1.85M 1.79M 334.3 1.43M 1.39M 215.9kMetis 3.58M 3.5M 99.6 2.46M 2.41M 63.7

sk-2005 uk-2007UFast 23.01M 20.34M 387.1 4.34M 4.10M 626.5UFastV 19.82M 18.18M 1166.4 4.19M 3.99M 1756.4kMetis 19.43M 18.56M 405.3 11.44M 10.86M 827.6

on average: 74% less edges cutuk-2007 can be partitioned in ≈ 10.5mincut less than half of the edges than kMetisUFastV always achieves best cut

Huge GraphsPerformance on huge networks for k = 16 blocks



graph arabic-2005 uk-2002algorithm avg. cut best cut t [s] avg. cut best cut avg. t [s]UFast 1.91M 1.87M 111.2 1.47M 1.43M 71.7UFastV 1.85M 1.79M 334.3 1.43M 1.39M 215.9kMetis 3.58M 3.5M 99.6 2.46M 2.41M 63.7

sk-2005 uk-2007UFast 23.01M 20.34M 387.1 4.34M 4.10M 626.5UFastV 19.82M 18.18M 1166.4 4.19M 3.99M 1756.4kMetis 19.43M 18.56M 405.3 11.44M 10.86M 827.6

on average: 74% less edges cutuk-2007 can be partitioned in ≈ 10.5min (parallel 15.2 sec)cut less than half of the edges than kMetisUFastV always achieves best cut

Conclusion



cluster contraction improves running time and solution qualityLP local search is a good alternative if IP is goodLater: algorithm provides large potential for parallelizationLater: algorithm provides large potential for externalizationHypergraphs?

→




What is GP, GC?Objective functions and principlesApplicationsWhy are GP, GC hard?Exact Solutions: ILPsSpectral Techniques for GP and GCLocal Search, Multilevel ApproachContraction Algorithms:Matching-, Clustering-,Dominating Set-based

Lecture Overview



Fundamentals, Problem Definitions and Objective FunctionsLots of ApplicationsNP-Hardness of GP and GCExact Partitioning/ClusteringSpectral Partitioning/ClusteringLocal Search, Multilevel AlgorithmsParallel, External- und Semi-ExternalGreedy Agglomeration / Top-Down ApproachesMin-Cut Tree ClusteringEvolutionary Algorithms and Meta-HeuristicsDynamic Clustering, Online Algorithms

Parallel (ML) Graph Partitioning



Use case:parallel processing→ use available processing powerspeedup computations of graph partitionerpartition graphs that do not fit into internal memory of a machine

A parallel ML graph partitioner needs:parallel coarseningparallel initial partitioningparallel local search



Parallel Graph Partitioning

1. Recursive Coordinate Bisection2. ParMetis3. KaPPa4. For Social Networks5. Facebook’s Approach

Recursive Coordinate Bisection[Berger, Bokhari’87]



Assumption: simulation yields coordinates for the vertices

Algorithm:find coordinate with longest expansion (x, y, or z)→ t

sort vertices according to t-coordinate (or find median)split in halfs, recurse

Parallel? simple!

Recursive Coordinate Bisection[Berger, Bokhari’87]



Assumption: simulation yields coordinates for the vertices

Pro:simple, easy to implementfast, similar to k-d-trees

Con:need coordinatesno control of communication costs

ParMetisParallel Multilevel Graph Partitioning



Metis meant “cunningness” or “wisdom, craft, skill” in Ancient Greek.

General:good and fast for mesh type networksbad for social networks (quality and running time)

Overview/Algorithmic Components:compute vertex coloring on each level→ resolve conflictsparallel matching

only match vertices in color classparallel initial partitioning

each PE performs IP on coarsest graph, use best resultparallel local search

only move vertices in color class

Graph Distribution over PEs



Graph Distribution: a PE receives n/p vertices and their edges

Processor I Processor II

Communication

ghost nodes: adjacent nodes on other processor (communication!)interface nodes: nodes adjacent to ghost nodes

Compute Independent Sets in Parallel



Definition (Independent Set)An independent set of a graph G = (V,E) is a subset S ⊆ V such thatu, v 6∈ E for all u, v ∈ S. Objective: maximize |S|.

Luby’s Algorithm: [Luby’86]assign random number to all v ∈ Vif num. v < num. all adj. verticesthen add to Iremove I and adj. vertices from G

repeat until graph empty

Theory: O(log |V |) iterations until I maximal

Compute Independent Sets in ParallelDistributed Memory Implementation



Definition (Independent Set)An independent set of a graph G = (V,E) is a subset S ⊆ V such thatu, v 6∈ E for all u, v ∈ S. Objective: maximize |S|.

Luby’s Algorithm: [Luby’86]assign random number to all v ∈ Vcommunicate local num to adj. PEsif num. v < num. all adj. verticesthen add to Iremove I and adj. vertices from G

communicate graph changesrepeat until graph empty

Compute Vertex Colorings in Parallel



Definition (Vertex Coloring)A vertex coloring of a graph G = (V,E) is a map C : V → 1, . . . , ` suchthat C(u) 6= C(v) ∀u, v ∈ E. Objective: minimize number of colors.

Greedy Algorithm:init color c = 0

compute independent set Ie.g. Luby’s Algorithm distributedcolor I with c, increment cremove I from G

communicate graph changesrepeat until graph empty

Compute Vertex Colorings in ParallelPractical Implementation in ParMetis



Reduce overall running time:only perform one augmentation step of Luby’s algorithm per color→ independent set not maximal→ more colorsstop when large fraction is colored→ not all nodes are colored→ most nodes participate at any level in coarsening, refinement→ limit required number of sync. steps significantly

Parallel Coarseningin ParMetis



Given: current level G, colored using Luby’s algorithmAlgorithm:

in cth iteration unmatched vertices with color c choose a matchingpartner with heavy edge heuristic

Analysis:assume (v, u) is selected by vsince u has a different color, u is not choosing a matching partnerhowever it is possible that ∃w that selects (w, u)

FIX: gather match requests, if multiple for u take highest weight edgeother request: remain unmatched

32

Request Communicate Choice

NO OK

Parallel Coarseningin ParMetis



Given: current level G, matching MContraction:

each PE knows how many vertices it needs to send and to receiveexchange subgraphs / adjacency listsmerge adjacency lists (perform contraction locally)

Initial Partitioning:collect coarsest level and distribute to all PEsperform independent initial partitioning (sequential)adopt best result to all PEs

A+B

a+b

B

a b

A

Parallel UnCoarseningin ParMetis



Given: current level G, partition of vertices, coloring

Greedy Local Search (Sequential):traverse nodes randomlymove node to a block that decreases cutsubject to balance constraint

Greedy Local Search (Parallel):retain spirit of seq. algorithmc phases (c number of colors)phase i: only vertices with color i are considered

Analysis (Parallel):same color vertices form independent set→ total reduction by moving all vertices at the same timeis the same as moving them one by one

Maintain the Balance Constraint



initially each PE knows weight of all k blocksduring each phase constraint is enforced using these weightseach PE: for each vertex moved update local weightsat end of each subphase: recompute block weights

Approx. Weighted Matching



Use parallel prepartitioner (optional)currently geometric recursive bipartitioning when coords. availableGlobal Path Algorithm [MaueSanders 2007] locallyparallel [ManneBisseling 2007] for border nodes (local Max!)




Every PE can perform initial partitioning using different seedsCompared PARTY, PMETIS, and SCOTCH

SCOTCH yielded best results (in the past)

match


uncontractcontract

outputpartition

input graph

partitioning

initial

Local Improvement – Basic Approach



Use linear time local search [Fiduccia Mattheyses 82]on multiple pairs of blocks in parallel

input graph

...

initial

...

outputpartition

local improvement

partitioning

match

contract uncontract

Definitions



Edge coloring:1. assign a color to each edge of a graph2. no vertex has two edges that have the same color3. use as few colors as possible

Note: each color in a edge coloring induces a matching

Parallel Local Search



Restriction: # of processors (PEs) equal number of blocksAlgorithm:

one PE per blockcompute edge coloring on quotient graphiterate over all matchings in coloring:

processor pairs exchange blocksperform pairwise local searchadopt best solution

Different Graph Data Structure:nodes can migrate instead of being staticPE p holds block Vpinitially: each PE static graph DS for own blockmigration of nodes: hashtable (for nodes) + second edge array

Distributed Graph Data StructurePE centric view



V

E

E’

Hashtable H

hashtable: store begin and end pointer in E′

use hashing with linear probing over std::unordered_map




Data : G = (V,E), initial partition Pdistribute G according to Pcompute quotient graph Q = (VQ, EQ) in parallelcompute edge coloring C : EQ → S of Q in a distributed wayforall the c ∈ C do

foreach e = Pi, Pj ∈ MC(c) parallel doperform local two-way refinement between Pi and Pj

ExampleA sample graph with Initial Partition (EC = 17)



ExampleQuotient Graph



ExampleCompute Edge Coloring






⇒

Parallel Local SearchAfter Local Search (cut = 16)



⇐

Parallel Computation of Q



Algorithm:each vertex has an IDeach PE has range of vertices vmin and vmaxdistribute table among the PE⇒ Φ : Id→ BlockIDlocally: iterate over edge listeach edge has a target ID⇒ targets Block = Φ(target ID)

add local quotient graph edge if targets Block 6= own block

V1 V2

Edge ColoringsDistributed Computation



Algorithm 3 Edge Coloring Algorithm (Sequential)Data : G = (V,E)Result : Edge Coloring of Gforall the random e ∈ E do

color e with smallest free color

Worst case: 2∆-1 Colors, with ∆ := maxv∈V deg(v)

Proof?



Algorithm 4 Distributed Edge Coloring AlgorithmData :Q = (VQ, EQ)foreach v ∈ VQ parallel do

initialize free list L with C = 2∆− 1 colors

while EQ 6= doforeach v ∈ VQ parallel do

throw a coinif coin shows head and deg(v) > 0 then v is active;if v is active then

pick a random incident edge (v, w) in EQ

send L to node wreply with REJECT to all incoming messages lists L’wait for a reply to the listif the reply is ACCEPT(c) then

color (v, w) with color cremove c from L

elseforeach message L’ from a neighbor u do

c := smallest color in L′ ∩ Lassign color c to edge (u,w)send ACCEPT(c) to uremove c from L

Distributed First FitPE Centric View



Algorithm:throw a coin (→ PE is active or inactive)initialize free color listsactive:

choose random uncolored neighborsend free color list, wait for replyreply is ACCEPT(c) or REJECTsend REJECT to all incoming request

inactive:work on all incoming messagescolor edge with smallest color c in intersect of free color listsreply with ACCEPT(c)

Distributed First FitExample



initial uncolored quotient graph




throw a coin (→ active/inactive)




initialize free color lists




choose random neighbor




active PEs reply with reject




4: compute smallest free color in intersect




throw coin, 4: reply 0 and color edge, 1: recv 0 and color edge




if all local edges colored -> stay inactive

Coloring As Communication Protocol



A Processor Pair RefinementInitial Situation - Locally Stored Blocks



CPU Pi CPU Pj

A Processor Pair RefinementExchange Data



CPU Pi CPU Pj

A Processor Pair RefinementBoth PEs Do Local Search - Exchange Improvement



CPU Pi CPU Pj

Winner (Improvement)

A Processor Pair RefinementWinner Sends Back Changes



CPU Pi CPU Pj

Reducing Communication Volume



Assumption: changes happens small area around the boundaryIdea: exchange only small area around the boundary

Find area using BFS initialized with all boundary nodes

ScalabilityDefinitions



Definition (Weak Scaling)How does the running time vary with the number of processors for afixed problem size per processor?

Definition (Strong Scaling)How does the running time vary with the number of processors for afixed problem size?

Strong scalability usually harder to achieve!

Scalablity



k

tota

l ti

me

[s]

510

2050

100

200

500

4 8 16 32 64 128 256 512 1024

510

2050

100

200

500

4 8 16 32 64 128 256 512 1024

510

2050

100

200

500

4 8 16 32 64 128 256 512 1024

510

2050

100

200

500

4 8 16 32 64 128 256 512 1024

510

2050

100

200

500

4 8 16 32 64 128 256 512 1024

510

2050

100

200

500

4 8 16 32 64 128 256 512 1024

KaPPa−StrongKaPPa−FastKaPPa−Minimalscotchkmetisparmetis

Street network Europe (|V | = 18M, |E| = 44M )

QualityComparison with Other Systems



Geometric mean, imbalance ε = 0.03:14 graphs (78K–18M nodes) ×k ∈ 2, 4, 8, 16, 64

Variant avg. cut balance avg. time [s]KaPPa-Strong 24 227 1.028 36.93KaPPa-Fast +2% 1.028 21.40KaPPa-Minimal +3% 1.028 5.94seq. scotch +7% 1.027 5.95kmetis +18% 1.026 0.79parmetis +30% 1.041 0.59

Walshaw instances, Road Networks, Florida Sparse Matrix Collection,Delaunay triangulations, Random Geometric Graphs, Social Networks.

Qualtity records:at submission time lots of improvements in WS Benchmark




1. Recursive Coordinate Bisection2. KaPPa3. For Social Networks4. Facebook’s Approach




aggressive contraction / simple and fast local searchmain idea: contract clusteringsclustering paradigm: internally dense and externally sparsenow parallel!

→

Label PropagationCut-based, Linear Time Clustering Algorithm



cut-based clustering using size-constraint label propagationstart with singletons or blocks during uncoarseningtraverse nodes in random order or smallest degree firstmove node to cluster having strongest eligible connectioneligible: w.r.t size constraint U

Scan

... ...

→

Label PropagationDistributed Memory



now static graph DS (as in ParMetis)each PE has a static part of the graph, only block IDs can change

Overlap Computation and Communication (PE centric view):

Scan

V

Phase i Phase i+1Phase i−1

At the end of phase i:send block ID updates of phase i to neighboring PEsreceive block ID updates from neighboring PEs from phase i− 1

* while scanning in phase i, messages are routed through the network* algorithm converged→ nothing will be communicated

Maintain the Balance Constraint



During Coarsening:maintaining balance is not a strong constraintonly measure PE local block weights

During UnCoarsening:as in ParMetis

Contraction of ClusteringsThe Parallel Case – High Level



A+B+C

a+b+ca bc

A

B

C

parallel find # of unique cluster IDs n′

parallel find mapping C : n..− 1→ n′..− 1

exchange subgraphs, compute contracted graph locallywhen graph small, use KaFFPaE as initial partitioner (next lectures)

Parallel Solution Quality



algorithm ParMetis Fastgraph best cut t[s] best cut t[s]amazon 47 010 0.49 45 872 1.85eu-2005 24 336 30.60 18 404 1.63youtube 171 857 6.10 171 549 8.74in-2004 5 276 3.43 3 110 1.38enwiki 9 553 051 326.92 9 565 648 157.32uk-2002 697 767 128.71 390 182 19.62del26 17 609 23.74 16 703 165.02rgg26 42 739 8.37 37 676 55.91arabic-2005 *968 871 *1 245.57 471 141 33.45sk-2005 * * 3 204 125 471.16uk-2007 * * 1 032 000 169.96

k = 2, 32 PEs, 512 GB RAM* ParMetis on arabic 15 PEs

Weak Scaling



10-10

10-9

10-8

10-7

10-6

10-5

10-4

1 2 4 8 16 32 64 256 1K 2K

tim

e per

edge

[s]

number of PEs p

DelX FastRggX Fast

DelX ParMetisRggX ParMetis

p PEs process instance with 219p nodes

Strong ScalingDelaunay Instances



1

10

100

1000

10000

1 2 4 8 16 32 64 256 1K 2K

tota

l ti

me

[s]

number of PEs p

Fast Del31Fast Del29Fast Del27

Fast Del25ParMetis Del27ParMetis Del25

DelX random Delaunay triangulation of 2X points

Strong ScalingSocial Networks



10

100

1000

1 2 4 8 16 32 64 256 1K 2K

tota

l ti

me

[s]

number of PEs p

Fast sk-2007Fast arabic-2005

Fast uk-2002

Fast uk-2007Minimal uk-2007

ParMetis could not solve any of those instances




1. Recursive Coordinate Bisection2. KaPPa3. For Social Networks4. Facebook’s Approach

Facebook’s ApproachConstrained Relocation Problem [Ugander, Backstrom’13]



Balanced Label Propagation:perform distributed label propagation:determine where every node would prefer to move (and gain)for each block pair: sort candidates by gainconstruct linear program for constrained relocationsolve linear program↔ determine which nodes to movemove the nodes




Linear Program:sort nodes that want from block Vi to block Vj by gainlabel them from 1 to Kgij(k) gain when moving kth node from Vi to Vjrelocation utility fij(x) :=

∑xk=1 gij(k)

Observations:gij(k) ≥ 0

gij(k) ≥ gij(k + 1)

fij(k) increasing and concave




ProblemGiven a graph, a partition and size constraints Si ≤ |Vi| ≤ Ti theconstraint relocation problem is to maximize:

max∑i,j

fij(xij) s.t.

Si − |Vi| ≤∑j 6=i

(xij − xji) ≤ Ti − |Vi| ∀i

0 ≤ xij ≤ Pij ∀i, j

where Pij is the number of nodes that want to move from Vi to Vj

relax xij → tractable optimization problem




TheoremAssume a bounded degree graph G. The objective f(x) =

∑i,j fij(xij),

is a piecewise-linear concave function, seperable in xij .

TheoremAny piecewise-linear concave function f(x) can be written asmink=1,...,`(a

Tk x+ bk).

TheoremLet x ∈ Rn and f(x) = mink=1,...,`(a

Tk x+ bk) be a piecewise-linear

concave function. Maximizing f(x) subject to Ax ≤ b is equivalent to

max z s.t.Ax ≤ b

aTk x+ bk ≥ z ∀k




TheoremConsider a bounded degree graph G = (V,E). Under continous xij , theconstrained relocation problem can be written as

max∑i,j

zij s.t.

Si − |Vi| ≤∑j 6=i

(xij − xji) ≤Ti − |Vi| ∀i

0 ≤ xij ≤ Pij ∀i, j−aijkxij + zij ≤ bijk ∀i, j, k

where a∗, b∗ derive directly from fij .

Geographic Initialization[Ugander, Backstrom’13]



greedy geographic information used as prepartitionor random

Iterating Balanced Label Propagation[Ugander, Backstrom’13]



78 blocks, single iteration requires 100 CPU days800 million nodes (users at the time)




What is GP, GC?Objective functions and principlesApplicationsWhy are GP, GC hard?Exact Solutions: ILPsSpectral Techniques for GP and GCLocal Search, Multilevel ApproachContraction AlgorithmsParallel Algorithms for GP and GC

Lecture Overview



Fundamentals, Problem Definitions and Objective FunctionsLots of ApplicationsNP-Hardness of GP and GCExact Partitioning/ClusteringSpectral Partitioning/ClusteringLocal Search, Multilevel AlgorithmsParallel, External- und Semi-ExternalGreedy Agglomeration / Top-Down ApproachesMin-Cut Tree ClusteringDynamic Clustering, Online AlgorithmsEvolutionary Algorithms and Meta-Heuristics

Greedy Agglomeration / Merge



dendrogram current clustering

1. start: singletons2. iterative agglomerations, yielding highest gain in quailty

(or least decrease)3. result: best intermediate clusteringmodularity: O(n2 log n) or O(md log n); often close to O(n log2 n)

Greedy Agglomeration / Merge




0.46

1. start: singletons2. iterative agglomerations, yielding highest gain in quailty

(or least decrease)3. result: best intermediate clusteringmodularity: O(n2 log n) or O(md log n); often close to O(n log2 n)

Larger Dendrogram



collaboration network, |V | ≈ 1000

Greedy Merge and Modularity



modularity has a single peak during agglomerationsimple to implement and rather successfulinefficient for large graphsonly known kernelization: degree-1 vertices

Data structures for efficient greedy agglomeration?

Modularity



mod(C) =m(C)m− 1

4m2

∑C∈C

(∑v∈C

deg(v)

)2

Now short:mod(C) =

∑i

(eii − a2i )

eij fraction of edges connecting cluster i to jai =

∑j eij

Exercise: Proof equality.

Modularity



TheoremJoining a pair of communities between which there are no edges cannever result in an increase in QProof (sketch):

mod(C) =∑

i(eii − a2i )

join block k with block j (call the new block `)ekk + ejj = e`` (no change in modularity)a2k + a2

j ≤ a2` = (ak + aj)

2

Modularity



We only need to consider merges of adjacent clusters.

TheoremThe change in modularity upon joining two communities is given by∆Q = ekj + ejk − 2akaj .

Proof (sketch):mod(C) =

∑i(eii − a2

i )

join block k with block j (call the new block `)edges between k and j will be covered→ account ekj + ejk

a` = ak + aj =∑

j e`j

a2` = a2

k + 2akaj + a2j

A Matrix for Maintaining ek,j



Maintain a matrix containing ek,jUpdate of matrix, when joining clusters Ck and Cj

Analysis:add rows and columns corresponding the joined communities O(n)

∆Q computed in constant time for each edge O(m)

at most n− 1 joins→ algorithm runs in O((m+ n)n)

A Matrix for Maintaining Merges



Idea: instead of storing adjacency matrix, store and update ∆Qk,jAdditionally: keep track of largest ∆Qk,j

additive update of gain, easy to handle:∆Qk,j = mod(Ck,j)−mod(C) = ekj + ejk − 2akaj

Ck,j the clustering obtained when merging k and j

Data structures:sparse matrix for ∆Q

balanced binary tree for each row (insert/deletion in O(log n))max-heap for each row (retrieve max-element in constant time)additional max-heap containing largest element of each rowmax-heap also stores labels i, j for the elementsordinary vector storing ai

A Matrix for Maintaining Merges



Initialization:start with singletonsekj = 1

2m if k and j connected, zero otherwise

ai = deg(i)2m

thus, initially:

∆Qkj =

1

2m −deg(k)deg(j)

4m2 if k, j connected0 otherwise

Algorithm:

1. initialize ∆Qkj and populate max-heap with largest elements rows2. select largest ∆Qkj from H, join communities3. update matrix, heap H and ai4. increment Q5. repat 2-4 until only one community remains

A Matrix for Maintaining MergesDetails



joining k and j → only update jth row and columnremove kth row and column

TheoremIf community ` is connected to both k and j then

∆Q′j` = ∆Qk` + ∆Qj`

If community ` is connected to k but not j then

∆Q′j` = ∆Qk` − 2aja`

If community ` is connected to j but not k then

∆Q′j` = ∆Qj` − 2aia`

Single peak: if largest ∆Q < 0, all the ∆Q can only decrease (!)

Analysis



TheoremEach join takes O((deg(k) + deg(j)) log n) time.

Update of jth rowinsert elements of kth row into jth row

sum wherever an element already existsdeg(k) operations, each takes O(logn) time

update other elements of jth rowat most deg(k) + deg(j)

`th row update a single element O(log n)at most deg(k) + deg(j)

overall O((deg(k) + deg(j)) log n)

Update of Max-Heapsrebuild max-heap for jth row O(deg(j))

`th row update O(log n) eachat most deg(k) + deg(j) updates of H (each O(log n))

Analysiscont.



TheoremThe total running time of the algorithm is O(md log n).Proof:each join takes O((deg(k) + deg(j)) log n) time→ time ≤ O(X log n) whereX = sum over all coressponding community degrees in dendrogram

Worst-Case:degree of community is sum of degrees of all contained vertices→ each vertex of the original network contributes its degree to allcommunities it is part of, along the path in dendrogram from it to root

Path length at most d (d height of Dendrogram)Hence, running time is bounded by

O(∑v

deg(v)d log n) = O(md log n)

Local Movement



We know: LM is a common technique in graph partitioning

locally greedynode shiftshierarchical contractionsmultilevel

[e.g. Blondel et al.’08]

Louvain Method[Blondel et al.’08]



modularity-based local movement + multi-level clusteringstart with singletonstraverse nodes in random ordermove node to neighboring cluster yielding highest mod increase

Scan

... ...

→

Louvain Method[Blondel et al.’08]



TheoremThe modularity gain by moving an isolated node u into a community Ccan be computed by

∆Q =s

m+

deg(u)

4m2+

(∑v∈C deg(v)

)24m2

−(deg(u) +

∑v∈C deg(v)

)24m2

where s is the number of edges that u has pointing inside C. A similarformular holds if a node u is removed from a cluster C.

Algorithm:remove u from cluster, compute gain ∆Q

for each cluster in N(v) compute add. gainif highest total gain positive: move node accordinglyImportant: store nessary data in vector (as before)

Louvain MethodMultilevel



Clustering MethodsExperiments



Karate Arxviv Internetn/m 34/77 9K/24K 70K/351K

Greedy 0.38/0s .0772/3.6s 0.692/799sLouvain 0.42/0s 0.813/0s 0.781/1s

nd.edu Phone uk-2005 webbase 2001n/m 325K/1M 2.04M/5.4M 29M/783M 118M/1B

Greedy 0.927/5034s -/- -/- -/-Louvain 0.935/3s 0.76/44s 0.979/738s 0.984/152mn

Greedy = Greedy Modularity Agglomeration

Local vs. Global Greedy



pro local movementlocal qualitatively superiorlocal quicker in practicebetter avoidance of local minima, larger search spacerefinement adds a few more percent to quality of local approachlocal easier to implement (no heaps, ...)

pro global agglomerationglobal easier to implement (no contraction, updates)runtime guarantees stronger for globalglobal yields continuous hierarchy

Min-Cut Tree Clustering



Raul

Susan

Phil

Robyn Els

Ken

Alice

KadkaChrisViolaine

Holly

DaveDoro Bob

Yoan

HelenCain

KateSue

Ron

Ralph

Tess

Mandy

Didi

Diane

Elaine

Richard

Clair

Marc

Toby

Frank

Lee

1. original graph

2. star-center t, α3. min-cut tree4. delete center⇒ clustering




1. original graph

2. star-center t, α3. min-cut tree4. delete center⇒ clustering




t

α

1. original graph2. star-center t, α

3. min-cut tree4. delete center⇒ clustering




t

1. original graph2. star-center t, α3. min-cut tree

4. delete center⇒ clustering

coined in [Gomory and Hu ’61]

simplified in [Gusfield ’90]

construction via (n− 1) max-flows(variants e.g. O(mn) for unweighted)




u

v






u

v

path betw.u and v






u

v

lightestedge onu-v-path






u

v






⇒ lightest u-v-cutin graph






t






1. original graph2. star-center t, α3. min-cut tree4. delete center⇒ clustering





quality guarantee: [Flake et al. ’04]

intra-clusterexpansion ≥ α ≥ inter-cluster

expansion∗





quality guarantee: [Flake et al. ’04]

ω(E(P,C\P ))min|P |,|C\P | ≥ α ≥ ω(E(C,V \C))

|V \C|

Discussion: Min-Cut Tree Clustering



a vertex v that served to identify a cluster-defining v-t-cut is calledthe representative of the respective cluster

scaling α yields a nested hierarchy of clusteringshierarchy has depth ≤ n− 1

yields a guarantee (very rare!)user needs to choose suitable α carefullyhigh runtime: O(n) max-flow computations

no “minimum” in denominator of inter-clusterexpansion∗ = ω(E(C,V \C))

|V \C|(otherwise not always solvable)




What is GP, GC?Objective functions and principlesApplicationsWhy are GP, GC hard?Exact Solutions: ILPsSpectral Techniques for GP and GCLocal Search, Multilevel ApproachContraction AlgorithmsParallel Algorithms for GP and GCGreedy Agglomeration / Louvain MethodMin-Cut Trees Clustering (outline)

Lecture Overview




The Update Problem



G

Given: graph G,

technique T ,⇒ clustering CThen: modification ∆,⇒ graph G′, T ⇒ clustering C′(G′)Question: Is there a shortcut ?

The Update Problem



G

C(G )

clusteringmethod

Given: graph G, technique T ,⇒ clustering C

Then: modification ∆,⇒ graph G′, T ⇒ clustering C′(G′)Question: Is there a shortcut ?

The Update Problem



G

C(G )

clusteringmethod

updateG ′

Given: graph G, technique T ,⇒ clustering CThen: modification ∆,⇒ graph G′,

T ⇒ clustering C′(G′)Question: Is there a shortcut ?

The Update Problem



G

C(G )

clusteringmethod

updateG ′

C′(G ′)

clusteringmethod

Given: graph G, technique T ,⇒ clustering CThen: modification ∆,⇒ graph G′, T ⇒ clustering C′(G′)

Question: Is there a shortcut ?

The Update Problem



G

C(G )

clusteringmethod

updateG ′

C′(G ′)shortcut ?

clusteringmethod

Given: graph G, technique T ,⇒ clustering CThen: modification ∆,⇒ graph G′, T ⇒ clustering C′(G′)Question: Is there a shortcut ?

Online Dynamic Graph Clustering



Dynamic Instanceschanging networks with evolving group structure

⇓Dynamic Approachupdate previous clustering reacting to changes in the graph

G G′∆

C(G) C′(G′)ATT

Clustering update problem

Criteria

speedqualitysmooth transitions

Heuristics Based on Locality



Dynamic modularity -maximization without provable quality

Changes in the graph invalidate:affected clusters

⇒ update of clustering





Changes in the graph invalidate:local area: 1-hop neighborhood






Changes in the graph invalidate:local area: 2-hop neighborhood


Prep Strategies



locality assumption

Local Heuristic in Bigger Context



motivation: local changes⇒ local consequences,„revolutions“ rare in practice

hope: small changes⇒ smooth transitionssmall search space⇒ fastlocal optimization⇒ quality ?

Prep Strategies: Concept



prep strategy S

reacts to changesprepares half-finished preclustering Cpasses C on to algorithm

strategies based, e.g., onlimited local searchbacktracking the dendrogram

S

∆(Gt−1, Gt)

A

Prep Strategy BT: Illustration




Prep Strategy BacktrackBacktrack Global’s merges according to heuristic rules

Dynamic Modularity-Clustering:Smooth Transitions



0 500 1000 1500 2000

0.05

0.10

0.15

0.20

0.25

distances between time steps(graph-based Rand-distance)

static Agglo.

static Louvain

dynAgglo backtrack

dynLouvain bnd neighdynAgglo bnd neigh

⇒ Dynamics yield smoother transitions[Görke et al.: Modularity-driven clustering of dynamic graphs 2010]

Dynamic Modularity-Clustering:Runtime



500 1000 1500 2000

1

10

100

1000

104

runtime per time step(milliseconds, logarithmic y-axis)

static Louvainstatic Agglo.

dynAgglo bnd neigh

dynLouvain bnd neigh

dynAgglo backtrack

⇒ dynamics yield lower runtimes[Görke et al.: Modularity-driven clustering of dynamic graphs 2010]

Dynamic Modularity-Clustering:Quality



500 1000 1500 2000

0.38

0.40

0.42

0.44

quality over time(modularity)

dynLouvain bnd neighstatic LouvaindynAgglo bnd neigh

dynAgglo backtrack

static Agglo.

⇒ dynamics yield higher quality

[Görke et al.: Modularity-driven clustering of dynamic graphs 2010]




What is GP, GC?Objective functions and principlesApplicationsWhy are GP, GC hard?Exact Solutions: ILPsSpectral Techniques for GP and GCLocal Search, Multilevel ApproachContraction AlgorithmsParallel Algorithms for GP and GCGreedy Agglomeration / Louvain MethodMin-Cut Trees Clustering (outline)Dynamic Clustering, Online Algorithms

Lecture Overview




Evolutionary Algorithms



Generalization of local search:x poplation of solution candiatesreproduction of fit solutionsmutation similar to local searchsurvival of the fittestadditionally: reproductionIdea: transfer good properties of parents

Evolutionary AlgorithmsOne of Many Possibilities



Evolutionary Graph Partitioning/Clustering:

individuals ↔ partitions/clusteringfitness ↔ edge cut/objective

procedure steady-state-EAcreate initial population Pwhile stopping criterion not fulfilled

select parents P1, P2 from Pcombine P1 with P2 to create offspring omutate offspring oevict individual in population using o

return the fittest individual that occurred




1. Evolutionary Graph Partitioning2. Ensemble Learning for Graph Clustering

Evolutionary AlgorithmsOne of Many Possibilities



procedure steady-state-EAcreate initial population Pwhile stopping criterion not fulfilled

select parents P1, P2 from Pcombine P1 with P2 to create offspring omutate offspring oevict individual in population using o

return the fittest individual that occurred

create P using KaFFPatournament selection (2 × out of two randoms pick best)combine→ following slidesmutation: V-cycle not using given partitioneviction based on similarity of cuts

Tradional Combine Operations[Soper et al. ]



Combine:given two partitions P1,P2

modify edge weights:edges that are cut edges in both, smallest weightedges that are cut edges in exactly on parent, second smallest weightother edges get larger weight+ small random bias (tie breaking)

apply MGP algorithm on graph with modified weights→ resulting partition is offspring

Extension:use BFS to define whole areas with smaller weights

Problems:slow convergence, correct partitioning problem?running times of up to one week for small graphs

More Natural Combine Operation



match

contract

two individuals P1, P2:don’t contract cut edges of P1 or P2

until no matchable edge is leftcoarsest graph↔ Q-graph of overlay→ exchanging good parts is easyinital solution: use better of both parents

ExampleTwo Individuals P1, P2



ExampleOverlay of P1, P2



⇒

ExampleMultilevel Combine of P1, P2



match

contract

⇒

Exchanging good parts is easyCoarsest Level



>> <

>> <

G

v1

v2

v3

v4 >> <

>> <

G

v1

v2

v3

v4

>> large weight, < small weightstart with the better partition (red, P2)move v4 to the opposite blockintegrated into multilevel scheme (+local search on each level)

ExampleResult of P1, P2



⇒

KaFFPaERecombination - Generalization



recombine a partition with any clustering of the graph e.g. :k′ 6= k partition with larger imbalanceslabel propagation clusteringnatural cuts: sparse cuts close to dense areas [Delling et al. ’11]

v

plug and play: use the clustering that fits your domain

extension on road networksnatural cuts + contraction, apply KaFFPaE on contracted graph

Parallelization



each PE has its own island (a local population)locally: perform combine and mutation operationscommunicate analog to randomized rumor spreading

1. rumor↔ currently best local partition2. local best partition changed → send it to O(logP ) random PEs3. asynchronous communication (MPI Isend)→ quality records in a few minutes for small graphs



More Experiments

Example



Street network Europe |V | = 18M, |E| = 44M , k = 64Buffoon ↔ kMetis

Quality of Operations



Algo. S3R K3R KC SCk Avg. improvement %2 591 2.4 1.6 0.24 1 304 3.4 4.0 0.28 2 336 3.7 3.6 0.2

16 3 723 2.9 2.0 0.232 5 720 2.7 3.3 0.064 8 463 2.8 3.0 −0.6

128 12 435 3.6 4.5 0.0256 17 915 3.4 4.2 −0.1

S3R: create three partitions with perturbed weightsK3R: create three partitions with KaFFPaKC: create two partitions, then combine themSC: create two partitions with perturbed weights, then combine them

QualityEvolutionary Graph Partitioning



blocks k KaFFPaEimprovement overreps. of KaFFPa

2 0.2%4 1.0%8 1.5%

16 2.7%32 3.4%64 3.3%

128 3.9%256 3.7%

overall 2.5%

2h time, 32 cores per graph and k, geom. mean

KaFFPaEvolutionary



1 5 20 100 500

7900

8100

8300

k=64

normalized time tn

mea

n m

in c

ut Repetitions

KaFFPaE

Scalability



1 10 100 1000 10000

2450

2500

2550

2600

normalized time tn

mea

n m

in c

ut

(mm

c)p = 1p = 2p = 4

p = 8p = 16p = 32

p = 64p = 128p = 256

Scalabilityp up to 256 PEs



tp = 15360/p seconds per instancepseudo speedup Sp(tn) = c′1(tn)/c′p(tn)

c′i(tn) = minci(t′)≤c1(tn) t′

1 10 100 1000 10000

2450

2500

2550

2600

normalized time tn

mea

n m

in c

ut

(mm

c)

p = 1p = 2p = 4

p = 8p = 16p = 32

p = 64p = 128p = 256

Scalability



1 10 100 1000 10000

normalized time tn

pse

ud

o s

pee

du

p

28

3212

851

2

p = 2p = 4p = 8

p = 16p = 32p = 64

p = 128p = 256

Walshaw Benchmark



816 instances (ε ∈ 0, 1%, 3%, 5%)focus on partition quality

overall quality records (1. Oct. 2012):

ε ≤0% 98%1% 99%3% 99%5% 99%

includes improvements when current record was usedconstitute 4%, 7%, 11%, 9%Includes negative cycle local search within EA.




1. Evolutionary Graph Partitioning2. Ensemble Learning for Graph Clustering

Ensemble Learning



Ensemble learning is a paradigm in machine learninglearn serveral weak classifierscombine them into strong classifier

Outline:determine serveral weak graph clusteringsdetermine maximal overlap (core clusters)continue to search for strong clustering

Core Groups Graph Clustering[Ovelgönne, Geyer-Schulz’12]



Question: does a pair of vertices belong to the same cluster?

if all initial clusterings agree on wheter a pair of vertices belongs tothe same cluster, we can be pretty sure.

Algorithm (CGGC):1. create ` ”good” clusterings of G with base algorithm A

2. determine maximal overlap of clusterings3. create graph G of induced by overlap clustering

(contraction of core groups)4. use base algorithm B to search for good clustering of G5. project clustering back to original graph

Maximal OverlapCore Groups, Ensemble Clusterings



Determine maximal overlap of clusterings:Given set S of clusterings P1, . . . , P`, create new P s.t.:

(∀i ∈ [1, `], v, w ∈ V : cPi

(v) = cPi(w))⇒ cP (v) = cP (w)(

∃i ∈ [1, `], v, w ∈ V : cPi(v) 6= cPi(w))⇒ cP (v) 6= cP (w)

Ensemble ClusteringsRepetition – Two Clusterings





⇒

Ensemble ClusteringsRepetition – Two Clusterings





C1

C2

O

construction: linear scan + hashing

Maximal Overlap



k clusterings→ do combination iteratively,or, use k-way hashing function to compute P

once P is computed, contract clustering to obtain GNote: G has as many vertices as P has clusterings

Iterated Approach



Iterated Approach (CGGCi):1. set Pbest to singletons and G to G2. create ` ”good” clusterings of G with base algorithm A

3. determine maximal overlap of clusterings P4. if P better then update Pbest and G (ind. by P ). goto step 25. use base algorithm B to search for good clustering of G6. project clustering back to original graph

define evolutionary algorithm?

Experiments



Conclusion



CGGCi scored most of the points in Graph Clustering part of10th DIMACS Implementation ChallengeKaFFPaE scored most of the points in Graph Partitioning part of10th DIMACS Implementation Challengeboth are “natural” evolutionary algorithmsthere is a MapReduce implementation of CGGC(using label propagation)

Lessons Learned



What is GP, GC?Objective functions and principlesApplicationsWhy are GP, GC hard?Exact Solutions: ILPsSpectral Techniques for GP and GCLocal Search, Multilevel ApproachContraction AlgorithmsParallel Algorithms for GP and GCGreedy Agglomeration / Louvain MethodMin-Cut Trees Clustering (outline)Dynamic Clustering, Online AlgorithmsEvolutionary Algorithms for GP and GC

References



[1] M. Holtgrewe, P. Sanders and C. Schulz. Engineering a Scalable High Quality Graph Partitioner. In24th IEEE International Parallel and Distributed Processing Symposium (IPDPS’10), 2010.

[2] P. Sanders and C. Schulz. Engineering Multilevel Graph Partitioning Algorithms. In Proceedings ofthe 19th European Symposium on Algorithms (ESA’11), volume 6942 of Lecture Notes in ComputerScience, pages 469–480. Springer, 2011.

[3] P. Sanders and C. Schulz. Distributed Evolutionary Graph Partitioning. In Proceedings of the 12thWorkshop on Algorithm Engineering and Experimentation (ALENEX’12), pages 16-19, 2012.

[4] I. Safro, P. Sanders and C. Schulz. Advanced Coarsening Schemes for Graph Partitioning. InProceedings of the 11th Symposium on Experimental Algorithms (SEA’12), volume 7276 of LectureNotes in Computer Science, pages 369–380. Springer, 2012.

[5] P. Sanders and C. Schulz. Think Locally, Act Globally: Highly Balanced Graph Partitioning. InProceedings of the 12th Symposium on Experimental Algorithms (SEA’13), volume 7933 of LectureNotes in Computer Science, pages 164–175. Springer, 2013.

[6] M. Birn, V. Osipov, P. Sanders, C. Schulz and N. Sitchinava. Efficient Parallel and ExternalMatchings. In Proceedings of the 19th International Conference, Euro-Par 2013, volume 8097 ofLecture Notes in Computer Science, pages 659–670. Springer, 2013.

[7] H. Meyerhenke, P. Sanders and C. Schulz. Partitioning Complex Networks via Size-constrainedClustering. In Proceedings of the 13th Symposium on Experimental Algorithms (SEA’14), volume8504 of Lecture Notes in Computer Science, pages 351–363. Springer, 2014.

References



[1] Y. Akhremtsev, P. Sanders and C. Schulz. (Semi-)External Algorithms for Graph Partitioning andClustering. In Proceedings of the 15th Workshop on Algorithm Engineering and Experimentation(ALENEX’15), pages 33-43, 2015.

[2] H. Meyerhenke, P. Sanders and C. Schulz. Parallel Graph Partitioning for Complex Networks. In29th IEEE International Parallel and Distributed Processing Symposium (IPDPS’15), to appear,2015.

[3] D. Bader, A. Kappes, H. Meyerhenke, P. Sanders, C. Schulz and D. Wagner. Benchmarking forGraph Clustering and Partitioning. In Encyclopedia of Social Network Analysis and Mining, 2014.

[4] C. Schulz. High Quality Graph Partitioning. Ph.D. Thesis, Karlsruhe Institute of Technology, 2013.

[5] A. Buluc, H. Meyerhenke, I. Safro, P. Sanders and C. Schulz Recent Advances in GraphPartitioning. Technical report, ITI Sanders, Department of Informatics, Karlsruhe Institute ofTechnology, 2013. (arXiv:1311.3144)

[6] K. Andreev and H. Räcke. Balanced Graph Partitioning. Theory of Computing Systems,39(6):929–939, 2006.

[7] S. T. Barnard and H. D. Simon. A Fast Multilevel Implementation of Recursive Spectral Bisection forPartitioning Unstructured Problems. In Proceedings of the 6th SIAM Conference on ParallelProcessing for Scientific Computing, pages 711–718, 1993.

References



[1] U. Brandes, M. Gaertler, and D. Wagner. Engineering graph clustering: Models and experimentalevaluation. ACM Journal of Experimental Algorithmics, 12, 2007.

[2] R. Brillout. A Multi-Level Framework for Bisection Heuristics. Bachelor thesis, Karlsruhe, BadenWürttemberg, Karlsruhe Institute of Technology, 2009.

[3] Ü. V. Çatalyürek and C. Aykanat. Decomposing Irregularly Sparse Matrices for ParallelMatrix-Vector Multiplication. In Proceedings of the 3rd International Workshop on ParallelAlgorithms for Irregularly Structured Problems, volume 1117, pages 75–86. Springer, 1996.

[4] D. Delling, A. V. Goldberg, T. Pajor, and R. F. Werneck. Customizable Route Planning. InProceedings of the 10th International Symposium on Experimental Algorithms, volume 6630 ofLCNS, pages 376–387. Springer, 2011.

[5] C. M. Fiduccia and R. M. Mattheyses. A Linear-Time Heuristic for Improving Network Partitions. InProceedings of the 19th Conference on Design Automation, pages 175–181, 1982.

[6] M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some Simplified NP-Complete Problems. InProceedings of the 6th ACM Symposium on Theory of Computing, STOC ’74, pages 47–63. ACM,1974.

[7] R. Görke. An Algorithmic Walk from Static to Dynamic Graph Clustering. PhD thesis, KarlsruheInstitute of Technology, 2010.

[8] B. Hendrickson. Graph Partitioning and Parallel Solvers: Has the Emperor No Clothes? InProceedings of the 5th International Symposium on Solving Irregularly Structured Problems inParallel, volume 1457 of LNCS, pages 218–225. Springer, 1998.

References



[1] M. Holzer, F. Schulz, and D. Wagner. Engineering multi-level overlay graphs for shortest-pathqueries. In Proceedings of the Eighth Workshop on Algorithm Engineering and Experiments,ALENEX 2006, Miami, Florida, USA, January 21, 2006, pages 156–170, 2006.

[2] R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. Journal ACM,51(3):497–515, 2004.

[3] B. W. Kernighan and S. Lin. An Efficient Heuristic Procedure for Partitioning Graphs. The BellSystem Technical Journal, 49(1):291–307, 1970.

[4] J. M. Kleinberg. An Impossibility Theorem for Clustering. In Advances in Neural InformationProcessing Systems 15 [Neural Information Processing Systems, NIPS 2002, December 9-14,2002, Vancouver, British Columbia, Canada], pages 446–453, 2002.

[5] R. H. Möhring, H. Schilling, B. Schütz, D. Wagner, and T. Willhalm. Partitioning Graphs to SpeedupDijkstra’s Algorithm. Journal of Experimental Algorithmics (JEA), 11(2006), 2007.

[6] M. EJ Newman and M. Girvan. Finding and Evaluating Community Structure in Networks. Physicalreview E, 69(2):026113, 2004.

[7] V. Osipov and P. Sanders. n-level graph partitioning. In Mark de Berg and Ulrich Meyer, editors,Algorithms – ESA 2010, volume 6346 of Lecture Notes in Computer Science, pages 278–289.Springer Berlin Heidelberg, 2010.

[8] K. Schloegel, G. Karypis, and V. Kumar. Graph Partitioning for High Performance ScientificSimulations. In The Sourcebook of Parallel Computing, pages 491–541, 2003.

References



[1] R. Shamir, R. Sharan, and D. Tsur. Cluster Graph Modification Problems. In Graph-TheoreticConcepts in Computer Science, pages 379–390. Springer, 2002.

[2] J. Shi and J. Malik. Normalized cuts and image segmentation. Pattern Analysis and MachineIntelligence, IEEE Transactions on, 22(8):888–905, 2000.

[3] J. Šíma and S. E. Schaeffer. On the NP-completeness of Some Graph Cluster Measures. InSOFSEM 2006: Theory and Practice of Computer Science, pages 530–537. Springer, 2006.

[4] C. Walshaw. Multilevel Refinement for Combinatorial Optimisation Problems. Annals of OperationsResearch, 131(1):325–372, 2004.

Graph Partitioning and Clustering in Theory and Practice - Christian ...

Documents

Transcript of Graph Partitioning and Clustering in Theory and Practice - Christian ...