Daniel A. Spielman MIT Fast, Randomized Algorithms for Partitioning, Sparsification, and the...

Post on 11-Jan-2016

219 views 0 download

Tags:

Transcript of Daniel A. Spielman MIT Fast, Randomized Algorithms for Partitioning, Sparsification, and the...

Daniel A. Spielman

MIT

Fast, Randomized Algorithms forPartitioning, Sparsification, andthe Solution of Linear Systems

Joint work with Shang-Hua Teng (Boston University)

Nearly-Linear Time Algorithms forGraph Partitioning, Graph Sparsification, and Solving Linear Systems S-Teng ‘04

Papers

The Mixing Rate of Markov Chains, An Isoperimetric Inequality, and Computing The Volume Lovasz-Simonovits ‘93

The Eigenvalues of Random Symmetric Matrices Füredi-Komlós ‘81

Preconditioning to solve linear system

Overview

Find B that approximates A, such that solving is easy

Three techniques: Augmented Spanning Trees [Vaidya ’90]

Sparsification: Approximate graph by sparse graph

Partitioning: Runtime proportional to #nodes removed

I. Linear system solvers

II. Sparsification using partitioning and random sampling

III. Graph partitioning by truncated random walks

Outline

Eigenvalues of random graphs

Combinatorial and Spectral Graph Theory

Diagonally Dominant Matrices

Just solve for

Solve wherediags of A at least sum of abs of others in row

1 2 3

Ex: Laplacian matrices of weighted graphs = negative weight from i to j diagonal = weighted degree

Complexity of Solving Ax = bA positive semi-definite

General Direct Methods:

Gaussian Elimination (Cholesky)

Conjugate Gradient

Fast matrix inversion

n is dimensionm is number non-zeros

Complexity of Solving Ax = bA positive semi-definite, structured

Planar: Nested dissection [Lipton-Rose-Tarjan ’79]

Tree: like path, work up from leaves

Path: forward and backward elimination

Express A = L LT

Preconditioned Conjugate Gradient

Omit for rest of talk

Iterative Methods

Find easy B that approximates A

Solve in time

Qualityof approximation

Time to solveBy = c

Iteratively solve in time

Main Result

For symmetric, diagonally dominant A

General:

Planar:

Vaidya’s Subgraph Preconditioners

Precondition A by the Laplacian of a subgraph, B

1 2

34

1 2

34

A B

Vaidya ’90 Relate to graph embedding cong/dil

use MSTaugmented MST

History

Gremban, Miller ‘96 Steiner vertices, many details

Bern, Boman, Chen, Gilbert, Hendrickson, Nguyen, Toledo All the details, algebraic approach

Joshi ’97, Reif ‘98 Recursive, multi-level approach

planar:

planar:

Maggs, Miller, Parekh, Ravi, Woo ‘02after preprocessing

History

Boman, Hendrickson ‘01 Low-stretch spanning trees

Clustering, Partitioning, Sparsification, Recursion Lower-stretch spanning trees

S-Teng ‘03 Augmented low-stretch trees

if A-B is positive semi-definite,that is,

if A is positive semi-definite

For A Laplacian of graph with edges E

The relative condition number:

Bounding

For B subgraph of A,

and so

Fundamental Inequality

18

1

2 3

8 7

45

6

18

1

2 3

8 7

45

6

Fundamental Inequality

Application to eigenvalues of graphs

Example: For complete graph on n nodes, all non-zero eigs = n

= 0 for Laplacian matrices, and

-7 -5 -3 -1 1 3 5 7

For path,

Lower bound on

[Gauttery-Leighton-Miller ‘97]

Lower bound on

So

And

Preconditioning with a Spanning Tree B

A B

Every edge of A not in B has unique path in B

When B is a Tree

Low-Stretch Spanning Trees

where

Theorem (Alon-Karp-Peleg-West ’91)

Theorem (Elkin-Emek-S-Teng ’04)

Theorem (Boman-Hendrickson ’01)

B a spanning tree plus s edges, in total

Vaidya’s Augmented Spanning Trees

Adding Edges to a Tree

Adding Edges to a Tree

Partition tree into t sub-trees, balancing stretch

Adding Edges to a Tree

Partition tree into t sub-trees, balancing stretchFor sub-trees connected in A, add one such bridge edge, carefully

(and in blue)

Adding Edges to a Tree

Theorem:

in general

if planar

t sub-trees

improve to

All graphs can be well-approximated by a sparse graph

SparsificationFeder-Motwani ’91, Benczur-Karger ’96

D. Eppstein, Z. Galil, G.F. Italiano, and T. Spencer. ‘93 D. Eppstein, Z. Galil, G.F. Italiano, and A. Nissenzweig ’97

SparsificationBenczur-Karger ’96: Can find sparse subgraph H s.t.

We need

(edges are subset, but different weights)

H has edges

Example: Complete Graph

If A is Laplacian of Kn, all non-zero eigenvalues are n

If B is Laplacian of Ramanujan expander all non-zero eigenvalues satisfy

And so

Example: Dumbell

If B does not contain middle edge,

Kn Kn

A

Example: Grid plus edge

1111

• Random sampling not sufficient.• Cut approximation not sufficient

= (m-1)2 + k(m-1)

(m-1)2

Conductance

Cut = partition of vertices

Conductance of S

S

Conductance of G

Conductance

S S

S

Conductance and Sparsification

If conductance high (expander) can precondition by random sampling

If conductance low can partition graph by removing few edges

Decomposition: Partition of vertex set remove few edges graph on each partition has high conductance

Graph Decomposition

Lemma: Exists partition of vertices

Each Vi has large At most half the edges cross partition

sample these edges

recurse on these edges

Alg:

Graph Decomposition Exists

Each Vi has At most half edges cross

Lemma: Exists partition of vertices

Proof: Let S be largest set s.t.

If

then

Graph Decomposition Exists

Proof: Let S be largest set s.t.

If

then

S

Graph Decomposition Exists

Proof: Let S be largest set s.t.

If

then

S

If S smallonly recurse in S

If S big, V-S not too big

S V - S

Bounded recursion depth

Sparsification from Graph Decomposition

sample these edges

recurse on these edges

Alg:

Theorem: Exists B with #edges <

Need to find the partition in nearly-linear time

Sparsification

Thm: For all A, find B edges

Thm: Given G, find subgraph H s.t.

in time

#edges(H) <

General solve time:

Cheeger’s Inequality[Sinclair-Jerrum ‘89]:

For Laplacian L, and D: diagonal matrix with = degree of node i

Useful if is big

Random Sampling

Given a graph will randomly sample to get

So that is small,

where is diagonal matrix of degrees in

Useful if is big

If

and

Then, for all x orthogonal to (mutual) nullspace

But, don’t want D in there

D has no impact:

So

implies

Work with adjacency matrix

No difference, because

= weight of edge from i to j, 0 if i = j

Random Sampling Rule

Choose param governing sparsity of

Keep edge (i,j) with prob

If keep edge, raise weight by

Random Sampling Rule

guarantees

And,

guarantees expect at most edges in

Random Sampling Theorem

Theorem

Useful for

From now on, all edges have weight 1

Will prove in unweighted case.

Analysis by Trace

Trace = sum of diagonal entries = sum of eigenvalues

For even k

Analysis by Trace

Main Lemma:

Proof of Theorem:

Markov:

k-th root

Expected Trace

21

3

4

Expected Trace

Most terms zero because

So, sum is zero unless each edge appears at least twice.

Will code such walks to count them.

Coding walks

For sequence where

S = {i : edge not used before, either way}

the index of the time edge was used before, either way

Coding walks: Example

2

1

3 4

step: 0, 1, 2, 3, 4, 5, 6, 7, 8vert: 1, 2, 3, 4, 2, 3, 4, 2, 1

S = {1, 2, 3, 4}

1 2 2 3 3 4 4 2

: 5 2 6 3 7 4 8 1

Now, do a few slides pointing out what it meas

Coding walks: Example

2

1

3 4

step: 0, 1, 2, 3, 4, 5, 6, 7, 8vert: 1, 2, 3, 4, 2, 3, 4, 2, 1

S = {1, 2, 3, 4}

1 2 2 3 3 4 4 2

: 5 2 6 3 7 4 8 1

Now, do a few slides pointing out what it meas

S = {1, 2, 3, 4}

1 2 2 3 3 4 4 2

Coding walks: Example

2

1

3 4

step: 0, 1, 2, 3, 4, 5, 6, 7, 8vert: 1, 2, 3, 4, 2, 3, 4, 2, 1

: 5 2 6 3 7 4 8 1

Now, do a few slides pointing out what it meas

S = {1, 2, 3, 4}

1 2 2 3 3 4 4 2

Coding walks: Example

2

1

3 4

step: 0, 1, 2, 3, 4, 5, 6, 7, 8vert: 1, 2, 3, 4, 2, 3, 4, 2, 1

: 5 2 6 3 7 4 8 1

Now, do a few slides pointing out what it meas

: 5 2 6 3 7 4 8 1

S = {1, 2, 3, 4}

1 2 2 3 3 4 4 2

Coding walks: Example

2

1

3 4

step: 0, 1, 2, 3, 4, 5, 6, 7, 8vert: 1, 2, 3, 4, 2, 3, 4, 2, 1

Now, do a few slides pointing out what it meas

Valid s

For each i in S is a neighbor of with a probability of being chosen < 1

That is, can be non-zero

For each i not in S can take edge indicated by

That is

Expected Trace by from Code

Expected Trace from Code

Are ways to choose given

Expected Trace from Code

Are ways to choose given

Random Sampling Theorem

Theorem

Useful for

Random sampling sparsifies graphs of high conductance

Graph Partitioning Algorithms

SDP/LP: too slow

Spectral: one cut quickly, but can be unbalanced many runs

Multilevel (Chaco/Metis): can’t analyze, miss small sparse cuts

New Alg: Based on truncated random walks. Approximates optimal balance,

Can be use to decompose

Run time

Lazy Random Walk

At each step: stay put with prob ½ otherwise, move to neighbor according to weight

Diffusing probability mass: keep ½ for self, distribute rest among neighbors

Lazy Random WalkDiffusing probability mass: keep ½ for self, distribute rest among neighbors

1 0

0

0

Or, with self-loops, distribute evenly over edges

1

3

2

2

Lazy Random WalkDiffusing probability mass: keep ½ for self, distribute rest among neighbors

1/2 1/2

0

0

1

3

2

2

Or, with self-loops, distribute evenly over edges

Lazy Random WalkDiffusing probability mass: keep ½ for self, distribute rest among neighbors

1/3 1/2

1/12

1/12

1

3

2

2

Or, with self-loops, distribute evenly over edges

Lazy Random WalkDiffusing probability mass: keep ½ for self, distribute rest among neighbors

1/4 11/24

7/48

7/48

1

3

2

2

Or, with self-loops, distribute evenly over edges

Lazy Random WalkDiffusing probability mass: keep ½ for self, distribute rest among neighbors

1/8 3/8

2/8

2/8

1

3

2

2

In limit, prob mass ~ degree

Why lazy?

Otherwise, might be no limit

1

1

0

0

Why so lazy to keep ½ mass?

Diagonally dominant matrices

weight of edge from i to j

degree of node i, if i = j

prob go from i to j

Rate of convergence

Cheeger’s inequality: related to conductance

If low conductance, slow convergence

If start uniform on S, prob leave at each step =

After steps, at least ¾ of mass still in S

Lovasz-Simonovits Theorem

If slow convergence, then low conductance

And, can find the cut from highest probability nodes

.137

.135

.134

.129 .112

.094

Lovasz-Simonovits Theorem

From now on, every node has same degree, d

For all vertex sets S, and all t · T

Where

k verts with most prob at step t

Lovasz-Simonovits Theorem

If start from node in set S with small conductance will output a set of small conductance

Extension: mostly contained in S

k verts with most prob at step t

Speed of Lovasz-Simonovits

Want run-time proportional to #vertices removedLocal clustering on massive graph

If want cut of conductancecan run for steps.

Problem: most vertices can have non-zero mass when cut is found

Speeding up Lovasz-Simonovits

Round all small entries of to zero

Algorithm Nibble input: start vertex v, target set size k start: step: all values < map to zero

Theorem: if v in conductance < size k set output a set with conductance < mostly overlapping

The Potential Function

For integer k

Linear in between these points

Concave: slopes decrease

For

Easy Inequality

Fancy Inequality

Fancy Inequality

Chords with little progress

Dominating curve, makes progress

Dominating curve, makes progress

Proof of Easy Inequality

time t-1

time t

Order vertices by probability mass

If top k at time t-1 only connect to top k at time t get equality

Proof of Easy Inequality

time t-1

time t

Order vertices by probability mass

If top k at time t-1 only connect to top k at time t get equality

Otherwise, some mass leaves, and get inequality

External edges from Self Loops

Lemma: For every set S, and every set R of same size At least frac of edges from S don’t hit R

3

3

3

3

3

3

S) = 1/2

S R

Tight example:

External edges from Self Loops

Lemma: For every set S, and every set R of same size At least S) frac of edges from S don’t hit R

Proof:

S R

When R=S, is by definition.Each edge in R-S can absorb d edges from S.But each vertex of S-R has d self-loops that do not go to R.

Proof of Fancy Inequality

time t-1

time t

topin out

It(k) = top + in · top + (in + out)/2 = top/2 + (top + in + out)/2

Local Clustering

Theorem: If S is set of conductance < v is random vertex of S Then output set of conductance < mostly in S, in time proportional to size of output.

Local Clustering

Can it be done for all conductances???

Theorem: If S is set of conductance < v is random vertex of S Then output set of conductance < mostly in S, in time proportional to size of output.

Experimental code: ClusTree

1. Cluster, crudely

2. Make trees in clusters

3. Add edges between trees, optimally

No recursion on reduced matrices

ImplementationClusTree: in java

timing not included could increase total time by 20%

PCGCholeskydroptol Inc CholVaidya

TAUCS Chen, Rotkin, Toledo

Orderings: amd, genmmd, Metis, RCM

Intel Xeon 3.06GHz, 512k L2 Cache, 1M L3 Cache

2D grid, Neumann bdry

run to residual error 10-8

0 0.5 1 1.5 2

x 106

0

20

40

60

80

100

120

num nodes

seco

nd

s

2d Grid, Neumann

pcg/ic/genmmddirect/genmmdVaidyaclusTree

0 0.5 1 1.5 2

x 106

0

20

40

60

80

100

120

num nodes

seco

nd

s

2d Grid, Neumann

pcg/ic/genmmddirect/genmmdVaidyaclusTree

0 0.5 1 1.5 2

x 106

0

20

40

60

80

100

num nodes

seco

nd

s

2d Grid, Dirichlet

direct/genmmdVaidyapcg/mic/rcmclusTree

2d Grid Dirichlet 2D grid, Neumann

Impact of boundary conditions

0 5 10 15

x 105

0

20

40

60

80

100

120

num nodes

seco

nd

s

2d Unstructured Delaunay Mesh, Neumann

Vaidyadirect/Metispcg/ic/genmmdclusTree

2D Unstructured Delaunay Mesh

0 5 10 15

x 105

0

20

40

60

80

100

120

num nodes

seco

nd

s

2d Unstructured Delaunay Mesh, Dirichlet

Vaidyadirect/Metispcg/mic/genmmdclusTree

Dirichlet Neumann

Future Work

Practical Local Clustering

More Sparsification

Other physical problems:

Cheeger’s inequalty

Implications for combinatorics,and Spectral Hypergraph Theory

“Nearly-Linear Time Algorithms for Graph Paritioning, Sparsification, and Solving Linear Systems” (STOC ’04, Arxiv)

To learn more

Will be split into two papers:numerical and combinatorial

My lecture notes forSpectral Graph Theory and its Applications