Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

73
Fast Regression Algorithms Using Spectral Graph Theory Richard Peng

Transcript of Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

Page 1: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

Fast Regression Algorithms Using Spectral Graph Theory

Richard Peng

Page 2: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

OUTLINE

•Regression: why and how• Spectra: fast solvers•Graphs: tree embeddings

Page 3: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

LEARNING / INFERENCE

Find (hidden) pattern in (noisy) data

Output:Input signal, s:

Page 4: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

REGRESSION

• p ≥ 1: convex• Convex constraints

e.g. linear equalities

Mininimize: |x|p

Subject to: constraints on x

minimize

Page 5: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

APPLICATION 0: LASSO

Widely used in practice:• Structured output• Robust to noise

[Tibshirani `96]:Min |x|1

s.t. Ax = s

Ax

Page 6: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

APPLICATION 1: IMAGES

No bears were harmed in the making of these slides

Poisson image processing

MinΣi~j∈E(xi-xj-si~j)2

Page 7: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

APPLICATION 2: MIN CUT

Remove fewest edges to separate vertices s and t

Min Σij∈E|xi-xj|

s.t. xs=0, xt=1

s t0 10

0 0

1

1 1

Fractional solution = integral solution

Page 8: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

REGRESSION ALGORITHMS

Convex optimization• 1940~1960: simplex, tractable• 1960~1980: ellipsoid, poly time• 1980~2000: interior point,

efficientÕ(m1/2) interior steps

• m = # non-zeros• Õ hides log

factors

minimize

Page 9: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

EFFICIENCY MATTERS

•m > 106 for most images• Even bigger (109):• Videos• 3D medical data

Page 10: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

Õ(m1/2)

KEY SUBROUTINE

Each step of interior point algorithms finds a step direction

minimize Ax=b

Linear system solves

Page 11: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

MORE REASONS FOR FAST SOLVERS

[Boyd-Vanderberghe `04], Figure 11.20:The growth in the average number of Newton iterations (on randomly generated SDPs)… is very small

Page 12: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

LINEAR SYSTEM SOLVERS

• [1st century CE] Gaussian Elimination: O(m3)• [Strassen `69] O(m2.8)• [Coppersmith-Winograd `90]

O(m2.3755)• [Stothers `10] O(m2.3737)• [Vassilevska Williams`11]

O(m2.3727)

Total: > m2

Page 13: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

NOT FAST NOT USED:

• Preferred in practice: coordinate descent, subgradient methods• Solution quality traded for time

Page 14: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

FAST GRAPH BASED L2 REGRESSION[SPIELMAN-TENG ‘04]

Input: Linear system where A is related to graphs, bOutput: Solution to Ax=bRuntime: Nearly Linear, Õ(m)

Ax=bMore in 12 slides

Page 15: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

GRAPHS USING ALGEBRA

Fast convergence+ Low cost per step= state of the art algorithms

Ax=b

Page 16: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

LAPLACIAN PARADIGM

[Daitch-Spielman `08]: mincost fow[Christiano-Kelner-Mądry-Spielman-

Teng `11]:approx maximum flow /min cut

Ax=b

Page 17: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

EXTENSION 1

[Chin-Mądry-Miller-P `12]:

regression, image processing, grouped L2

Page 18: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

EXTENSION 2

[Kelner-Miller-P `12]: k-commodity flowDual: k-variate labeling of graphs

s t

Page 19: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

EXTENSION 3

[Miller-P `13]: faster for structured images / separable graphs

Page 20: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

NEED: FAST LINEAR SYSTEM SOLVERS

Implication of fast solvers:• Fast regression routines• Parallel, work efficient graph algorithms

minimize Ax=b

Page 21: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

OTHER APPLICATIONS

• [Tutte `66]: planar embedding• [Boman-Hendrickson-Vavasis`04]: PDEs• [Orecchia-Sachedeva-Vishnoi`12]: balanced cut / graph separator

Page 22: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

OUTLINE

• Regression: why and how•Spectra: Linear system solvers•Graphs: tree embeddings

Page 23: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

PROBLEM

Given: matrix A, vector bSize of A:• n-by-n• m non-zeros

Ax=b

Page 24: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

SPECIAL STRUCTURE OF A

A = Deg – Adj• Deg: diag(degree)• Adj: adjacency

matrix

[Gremban-Miller `96]: extensions to SDD matrices

`

Aij =deg(i) if i=jw(ij)

otherwise

Page 25: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

UNSTRUCTURED GRAPHS

• Social network• Intermediate systems of other algorithms are almost adversarial

Page 26: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

NEARLY LINEAR TIME SOLVERS[SPIELMAN-TENG ‘04]

Input: n by n graph Laplacian Awith m non-zeros,

vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.

|x-x’|A<ε|x|A

Runtime: Nearly Linear.O(m logcn log(1/ε)) expected

• runtime is cost per bit of accuracy.

• Error in the A-norm: |y|A=√yTAy.

Page 27: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

HOW MANY LOGS

Runtime: O(mlogcn log(1/ ε))

Value of c: I don’t know

[Spielman]: c≤70

[Koutis]: c≤15

[Miller]: c≤32

[Teng]: c≤12

[Orecchia]: c≤6

When n = 106, log6n > 106

Page 28: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

PRACTICAL NEARLY LINEAR TIME SOLVERS[KOUTIS-MILLER-P `10]

Input: n by n graph Laplacian Awith m non-zeros,

vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.

|x-x’|A<ε|x|A

Runtime: O(mlog2n log(1/ ε))• runtime is cost per bit of

accuracy.• Error in the A-norm: |y|A=√yTAy.

Page 29: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

PRACTICAL NEARLY LINEAR TIME SOLVERS[KOUTIS-MILLER-P `11]

Input: n by n graph Laplacian Awith m non-zeros,

vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.

|x-x’|A<ε|x|A

Runtime: O(mlogn log(1/ ε))• runtime is cost per bit of

accuracy.• Error in the A-norm: |y|A=√yTAy.

Page 30: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

STAGES OF THE SOLVER

• Iterative Methods• Spectral Sparsifiers• Low Stretch Spanning Trees

Page 31: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

ITERATIVE METHODS

Numerical analysis:Can solve systems in A by iteratively solving spectrally similar, but easier, B

Page 32: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

WHAT IS SPECTRALLY SIMILAR?

A ≺ B ≺ kA for some small k

• Ideas from scalars hold!• A ≺ B: for any vector x,

|x|A2 < |x|B

2

[Vaidya `91]: Since A is a graph, B should be too!

[Vaidya `91]: Since G is a graph, H should be too!

Page 33: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

`EASIER’ H

Goal: H with fewer edges that’s similar to G

Ways of easier:• Fewer vertices• Fewer edges

Can reduce vertex count if edge count is small

Page 34: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

GRAPH SPARSIFIERS

Sparse equivalents of graphs that preserve something

• Spanners: distance, diameter.• Cut sparsifier: all cuts.•What we need: spectrum

Page 35: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

WHAT WE NEED: ULTRASPARSIFIERS

[Spielman-Teng `04]: ultrasparsifiers with n-1+O(mlogpn/k) edges imply solvers with O(mlogpn) running time.

• Given: G with n vertices, m edges parameter k• Output: H with n

vertices, n-1+O(mlogpn/k) edges• Goal: G ≺ H ≺ kG

` `

Page 36: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

EXAMPLE: COMPLETE GRAPH

O(nlogn) random edges (with scaling) suffice w.h.p.

Page 37: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

GENERAL GRAPH SAMPLING MECHANISM

• For edge e, flip coin Pr(keep) = P(e)• Rescale to maintain expectation

Number of edges kept: ∑e P(e)

Also need to prove concentration

Page 38: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

EFFECTIVE RESISTANCE

•View the graph as a circuit•R(u,v) = Pass 1 unit of current from u to v, measure resistance of circuit

`

Page 39: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

EE101

Effective resistance in general:solve Gx = euv, where euv is indicator vector, R(u,v) = xu – xv.

`

Page 40: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

(REMEDIAL?) EE101

•Single edge: R(e) = 1/w(e)•Series: R(u, v) = R(e1) + … + R(el)

`w1

`

u v

u v

w1 w2

R(u, v) = 1/w1

R(u, v) = 1/w1 + 1/w2

Page 41: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

SPECTRAL SPARSIFICATION BY EFFECTIVE RESISTANCE

[Spielman-Srivastava `08]: Setting P(e) to W(e)R(u,v)O(logn) gives G ≺ H ≺ 2G*

*Ignoring probabilistic issues

[Foster `49]: ∑e W(e)R(e) = n-1Spectral sparsifier with O(nlogn) edges

Ultrasparsifier? Solver???

Page 42: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

THE CHICKEN AND EGG PROBLEM

How to find effective resistance?

[Spielman-Srivastava `08]: use solver[Spielman-Teng `04]: need sparsifier

Page 43: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

OUR WORK AROUND

•Use upper bounds of effective resistance, R’(u,v)•Modify the problem

Page 44: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

RAYLEIGH’S MONOTONICITY LAW

Rayleigh’s Monotonicity Law: R(u, v) only increase when edges are removed

`

Calculate effective resistance w.r.t. a tree T

Page 45: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

SAMPLING PROBABILITIES ACCORDING TO TREE

Sample Probability: edge weight times effective resistance of tree path

`

Goal: small total stretch

stretch

Page 46: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

GOOD TREES EXIST

Every graph has a spanning tree with total stretch O(mlogn)

O(mlog2n) edges, too many!

∑e W(e)R’(e) = O(mlogn)

Hiding loglogn

More in 12 slides (again!)

Page 47: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

‘GOOD’ TREE???

Unit weight case:stretch ≥ 1 for all edges

`

Stretch = 1+1 = 2

Page 48: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

WHAT ARE WE MISSING?

• Need:• G ≺ H ≺ kG• n-1+O(mlogpn/k) edges

• Generated:• G ≺ H ≺ 2G• n-1+O(mlog2n) edges

` `Haven’t used k!

Page 49: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

USE K, SOMEHOW

• Tree is good!• Increase weights of

tree edges by factor of k

`

G ≺ G’ ≺ kG

Page 50: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

RESULT

• Tree heavier by factor of k• Tree effective resistance

decrease by factor of k

`

Stretch = 1/k+1/k = 2/k

Page 51: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

NOW SAMPLE?

Expected in H:Tree edges: n-1Off tree edges: O(mlog2n/k)

`

Total: n-1+O(mlog2n/k)

Page 52: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

BUT WE CHANGED G!

G ≺ G’ ≺ kGG’ ≺ H ≺ 2G’

`

G ≺ H≺ 2kG

Page 53: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

WHAT WE NEED: ULTRASPARSIFIERS

[Spielman-Teng `04]: ultrasparsifiers with n-1+O(mlogpn/k) edges imply solvers with O(mlogpn) running time.

• Given: G with n vertices, m edges parameter k• Output: H with n

vertices, n-1+O(mlogpn/k) edges• Goal: G ≺ H ≺ kG

` `

G ≺ H≺ 2kGn-1+O(mlog2n/k) edges

Page 54: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

• Input: Graph Laplacian G• Compute low stretch tree T of

G• T ( log2n) T • H G + T • H SampleT(H)• Solve G by iterating on H and

solving recursively, but reuse T

PSEUDOCODE OF O(MLOGN) SOLVER

Page 55: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

EXTENSIONS / GENERALIZATIONS

• [Koutis-Levin-P `12]: sparsify mildly dense graphs in O(m) time• [Miller-P `12]: general matrices: find ‘simpler’ matrix that’s similar in O(m+n2.38+a) time.

` `

Page 56: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

SUMMARY OF SOLVERS

• Spectral graph theory allows one to find similar, easier to solve graphs• Backbone: good trees

` `

Page 57: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

SOLVERS USING GRAPH THEORY

Fast solvers for graph Laplacians use combinatorial graph theory

Ax=b

Page 58: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

OUTLINE

• Regression: why and how• Spectra: linear system solvers•Graphs: tree embeddings

Page 59: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

LOW STRETCH SPANNING TREE

Sampling probability: edge weight times effective resistance of tree path Unit weight case: length of tree path

Low stretch spanning tree: small total stretch

Page 60: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

DIFFERENT THAN USUAL TREES

n1/2-by-n1/2 unit weighted mesh

stretch(e)= O(1)

total stretch = Ω(n3/2)

stretch(e)=Ω(n1/2)

‘haircomb’ is both shortest path and max weight spanning tree

Page 61: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

A BETTER TREE FOR THE GRID

Recursive ‘C’

Page 62: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

LOW STRETCH SPANNING TREES

[Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08]:Any graph has a spanning tree with total stretch O(mlogn)

Hiding loglogn

Page 63: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

ISSUE: RUNNING TIME

Algorithms given by[Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08]take O(nlog2n+mlogn) time

Reason: O(logn) shortest paths

Page 64: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

SPEED UP

[Koutis-Miller-P `11]:• Round edge weights to powers of

2• k=logn, total work = O(mlogn)

[Orlin-Madduri-Subramani-Williamson `10]:Shortest path on graphs with k distinct weights can run in O(mlogm/nk) time

Hiding loglogn, we actually improve these

Page 65: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

• [Blelloch-Gupta-Koutis-Miller-P-Tangwongsan. `11]: current framework parallelizes to O(m1/3+a) depth• Combine with Laplacian paradigm fast parallel graph algorithms

` `

PARALLEL ALGORITHM?

Page 66: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

• Before this work: parallel time > state of the art sequential time

• Our result: parallel work close to sequential, and O(m2/3) time

PARALLEL GRAPH ALGORITHMS?

Page 67: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

FUNDAMENTAL PROBLEM

Long standing open problem: theoretical speedups for BFS / shortest path in directed graphs

Sequential algorithms are too fast!

Page 68: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

First step of framework by[Elkin-Emek-Spielman-Teng `05]:

` `

PARALLEL ALGORITHM?

shortest path

Page 69: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

•Workaround: use earlier algorithm by [Alon-Karp-Peleg-West `95]

• Idea: repeated clustering• Based on ideas from [Cohen `93, `00] for approximating shortest path

PARALLEL TREE EMBEDDING

Page 70: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

PARALLEL TREE EMBEDDING

Page 71: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

THE BIG PICTURE

•Need fast linear system solvers for graph regression•Need combinatorial graph algorithms for fast solvers

Ax=bminimize

Page 72: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

ONGOING / FUTURE WORK

• Better regression?• Faster/parallel solver?• Sparse approximate (pseudo) inverse?•Other types of systems?

Page 73: Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

THANK YOU!

Questions?