Fast Regression Algorithms Using Spectral Graph Theory

73
Fast Regression Algorithms Using Spectral Graph Theory Richard Peng

description

Fast Regression Algorithms Using Spectral Graph Theory. Richard Peng. Outline. Regression: why and how Spectra: fast solvers Graphs: tree embeddings. Learning / Inference. Find (hidden) pattern in (noisy) data. Input signal, s:. Output:. Regression. p ≥ 1: convex - PowerPoint PPT Presentation

Transcript of Fast Regression Algorithms Using Spectral Graph Theory

Page 1: Fast Regression Algorithms Using Spectral Graph Theory

Fast Regression Algorithms Using Spectral Graph Theory

Richard Peng

Page 2: Fast Regression Algorithms Using Spectral Graph Theory

OUTLINE

•Regression: why and how• Spectra: fast solvers•Graphs: tree embeddings

Page 3: Fast Regression Algorithms Using Spectral Graph Theory

LEARNING / INFERENCE

Find (hidden) pattern in (noisy) data

Output:Input signal, s:

Page 4: Fast Regression Algorithms Using Spectral Graph Theory

REGRESSION

• p ≥ 1: convex• Convex constraints

e.g. linear equalities

Mininimize: |x|p

Subject to: constraints on x

minimize

Page 5: Fast Regression Algorithms Using Spectral Graph Theory

APPLICATION 0: LASSO

Widely used in practice:• Structured output• Robust to noise

[Tibshirani `96]:Min |x|1

s.t. Ax = s

Ax

Page 6: Fast Regression Algorithms Using Spectral Graph Theory

APPLICATION 1: IMAGES

No bears were harmed in the making of these slides

Poisson image processing

MinΣi~j∈E(xi-xj-si~j)2

Page 7: Fast Regression Algorithms Using Spectral Graph Theory

APPLICATION 2: MIN CUT

Remove fewest edges to separate vertices s and t

Min Σij∈E|xi-xj|

s.t. xs=0, xt=1

s t0 10

0 0

1

1 1

Fractional solution = integral solution

Page 8: Fast Regression Algorithms Using Spectral Graph Theory

REGRESSION ALGORITHMS

Convex optimization• 1940~1960: simplex, tractable• 1960~1980: ellipsoid, poly time• 1980~2000: interior point,

efficientÕ(m1/2) interior steps

• m = # non-zeros• Õ hides log

factors

minimize

Page 9: Fast Regression Algorithms Using Spectral Graph Theory

EFFICIENCY MATTERS

•m > 106 for most images• Even bigger (109):• Videos• 3D medical data

Page 10: Fast Regression Algorithms Using Spectral Graph Theory

Õ(m1/2)

KEY SUBROUTINE

Each step of interior point algorithms finds a step direction

minimize Ax=b

Linear system solves

Page 11: Fast Regression Algorithms Using Spectral Graph Theory

MORE REASONS FOR FAST SOLVERS

[Boyd-Vanderberghe `04], Figure 11.20:The growth in the average number of Newton iterations (on randomly generated SDPs)… is very small

Page 12: Fast Regression Algorithms Using Spectral Graph Theory

LINEAR SYSTEM SOLVERS

• [1st century CE] Gaussian Elimination: O(m3)• [Strassen `69] O(m2.8)• [Coppersmith-Winograd `90]

O(m2.3755)• [Stothers `10] O(m2.3737)• [Vassilevska Williams`11]

O(m2.3727)

Total: > m2

Page 13: Fast Regression Algorithms Using Spectral Graph Theory

NOT FAST NOT USED:

• Preferred in practice: coordinate descent, subgradient methods• Solution quality traded for time

Page 14: Fast Regression Algorithms Using Spectral Graph Theory

FAST GRAPH BASED L2 REGRESSION[SPIELMAN-TENG ‘04]

Input: Linear system where A is related to graphs, bOutput: Solution to Ax=bRuntime: Nearly Linear, Õ(m)

Ax=bMore in 12 slides

Page 15: Fast Regression Algorithms Using Spectral Graph Theory

GRAPHS USING ALGEBRA

Fast convergence+ Low cost per step= state of the art algorithms

Ax=b

Page 16: Fast Regression Algorithms Using Spectral Graph Theory

LAPLACIAN PARADIGM

[Daitch-Spielman `08]: mincost fow[Christiano-Kelner-Mądry-Spielman-

Teng `11]:approx maximum flow /min cut

Ax=b

Page 17: Fast Regression Algorithms Using Spectral Graph Theory

EXTENSION 1

[Chin-Mądry-Miller-P `12]:

regression, image processing, grouped L2

Page 18: Fast Regression Algorithms Using Spectral Graph Theory

EXTENSION 2

[Kelner-Miller-P `12]: k-commodity flowDual: k-variate labeling of graphs

s t

Page 19: Fast Regression Algorithms Using Spectral Graph Theory

EXTENSION 3

[Miller-P `13]: faster for structured images / separable graphs

Page 20: Fast Regression Algorithms Using Spectral Graph Theory

NEED: FAST LINEAR SYSTEM SOLVERS

Implication of fast solvers:• Fast regression routines• Parallel, work efficient graph algorithms

minimize Ax=b

Page 21: Fast Regression Algorithms Using Spectral Graph Theory

OTHER APPLICATIONS

• [Tutte `66]: planar embedding• [Boman-Hendrickson-Vavasis`04]: PDEs• [Orecchia-Sachedeva-Vishnoi`12]: balanced cut / graph separator

Page 22: Fast Regression Algorithms Using Spectral Graph Theory

OUTLINE

• Regression: why and how•Spectra: Linear system solvers•Graphs: tree embeddings

Page 23: Fast Regression Algorithms Using Spectral Graph Theory

PROBLEM

Given: matrix A, vector bSize of A:• n-by-n• m non-zeros

Ax=b

Page 24: Fast Regression Algorithms Using Spectral Graph Theory

SPECIAL STRUCTURE OF A

A = Deg – Adj• Deg: diag(degree)• Adj: adjacency

matrix

[Gremban-Miller `96]: extensions to SDD matrices

`

Aij =deg(i) if i=jw(ij)

otherwise

Page 25: Fast Regression Algorithms Using Spectral Graph Theory

UNSTRUCTURED GRAPHS

• Social network• Intermediate systems of other algorithms are almost adversarial

Page 26: Fast Regression Algorithms Using Spectral Graph Theory

NEARLY LINEAR TIME SOLVERS[SPIELMAN-TENG ‘04]

Input: n by n graph Laplacian Awith m non-zeros,

vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.

|x-x’|A<ε|x|A

Runtime: Nearly Linear.O(m logcn log(1/ε)) expected

• runtime is cost per bit of accuracy.

• Error in the A-norm: |y|A=√yTAy.

Page 27: Fast Regression Algorithms Using Spectral Graph Theory

HOW MANY LOGS

Runtime: O(mlogcn log(1/ ε))

Value of c: I don’t know

[Spielman]: c≤70

[Koutis]: c≤15

[Miller]: c≤32

[Teng]: c≤12

[Orecchia]: c≤6

When n = 106, log6n > 106

Page 28: Fast Regression Algorithms Using Spectral Graph Theory

PRACTICAL NEARLY LINEAR TIME SOLVERS[KOUTIS-MILLER-P `10]

Input: n by n graph Laplacian Awith m non-zeros,

vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.

|x-x’|A<ε|x|A

Runtime: O(mlog2n log(1/ ε))• runtime is cost per bit of

accuracy.• Error in the A-norm: |y|A=√yTAy.

Page 29: Fast Regression Algorithms Using Spectral Graph Theory

PRACTICAL NEARLY LINEAR TIME SOLVERS[KOUTIS-MILLER-P `11]

Input: n by n graph Laplacian Awith m non-zeros,

vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.

|x-x’|A<ε|x|A

Runtime: O(mlogn log(1/ ε))• runtime is cost per bit of

accuracy.• Error in the A-norm: |y|A=√yTAy.

Page 30: Fast Regression Algorithms Using Spectral Graph Theory

STAGES OF THE SOLVER

• Iterative Methods• Spectral Sparsifiers• Low Stretch Spanning Trees

Page 31: Fast Regression Algorithms Using Spectral Graph Theory

ITERATIVE METHODS

Numerical analysis:Can solve systems in A by iteratively solving spectrally similar, but easier, B

Page 32: Fast Regression Algorithms Using Spectral Graph Theory

WHAT IS SPECTRALLY SIMILAR?

A ≺ B ≺ kA for some small k

• Ideas from scalars hold!• A ≺ B: for any vector x,

|x|A2 < |x|B

2

[Vaidya `91]: Since A is a graph, B should be too!

[Vaidya `91]: Since G is a graph, H should be too!

Page 33: Fast Regression Algorithms Using Spectral Graph Theory

`EASIER’ H

Goal: H with fewer edges that’s similar to G

Ways of easier:• Fewer vertices• Fewer edges

Can reduce vertex count if edge count is small

Page 34: Fast Regression Algorithms Using Spectral Graph Theory

GRAPH SPARSIFIERS

Sparse equivalents of graphs that preserve something

• Spanners: distance, diameter.• Cut sparsifier: all cuts.•What we need: spectrum

Page 35: Fast Regression Algorithms Using Spectral Graph Theory

WHAT WE NEED: ULTRASPARSIFIERS

[Spielman-Teng `04]: ultrasparsifiers with n-1+O(mlogpn/k) edges imply solvers with O(mlogpn) running time.

• Given: G with n vertices, m edges parameter k• Output: H with n

vertices, n-1+O(mlogpn/k) edges• Goal: G ≺ H ≺ kG

` `

Page 36: Fast Regression Algorithms Using Spectral Graph Theory

EXAMPLE: COMPLETE GRAPH

O(nlogn) random edges (with scaling) suffice w.h.p.

Page 37: Fast Regression Algorithms Using Spectral Graph Theory

GENERAL GRAPH SAMPLING MECHANISM

• For edge e, flip coin Pr(keep) = P(e)• Rescale to maintain expectation

Number of edges kept: ∑e P(e)

Also need to prove concentration

Page 38: Fast Regression Algorithms Using Spectral Graph Theory

EFFECTIVE RESISTANCE

•View the graph as a circuit•R(u,v) = Pass 1 unit of current from u to v, measure resistance of circuit

`

Page 39: Fast Regression Algorithms Using Spectral Graph Theory

EE101

Effective resistance in general:solve Gx = euv, where euv is indicator vector, R(u,v) = xu – xv.

`

Page 40: Fast Regression Algorithms Using Spectral Graph Theory

(REMEDIAL?) EE101

•Single edge: R(e) = 1/w(e)•Series: R(u, v) = R(e1) + … + R(el)

`w1

`

u v

u v

w1 w2

R(u, v) = 1/w1

R(u, v) = 1/w1 + 1/w2

Page 41: Fast Regression Algorithms Using Spectral Graph Theory

SPECTRAL SPARSIFICATION BY EFFECTIVE RESISTANCE

[Spielman-Srivastava `08]: Setting P(e) to W(e)R(u,v)O(logn) gives G ≺ H ≺ 2G*

*Ignoring probabilistic issues

[Foster `49]: ∑e W(e)R(e) = n-1Spectral sparsifier with O(nlogn) edges

Ultrasparsifier? Solver???

Page 42: Fast Regression Algorithms Using Spectral Graph Theory

THE CHICKEN AND EGG PROBLEM

How to find effective resistance?

[Spielman-Srivastava `08]: use solver[Spielman-Teng `04]: need sparsifier

Page 43: Fast Regression Algorithms Using Spectral Graph Theory

OUR WORK AROUND

•Use upper bounds of effective resistance, R’(u,v)•Modify the problem

Page 44: Fast Regression Algorithms Using Spectral Graph Theory

RAYLEIGH’S MONOTONICITY LAW

Rayleigh’s Monotonicity Law: R(u, v) only increase when edges are removed

`

Calculate effective resistance w.r.t. a tree T

Page 45: Fast Regression Algorithms Using Spectral Graph Theory

SAMPLING PROBABILITIES ACCORDING TO TREE

Sample Probability: edge weight times effective resistance of tree path

`

Goal: small total stretch

stretch

Page 46: Fast Regression Algorithms Using Spectral Graph Theory

GOOD TREES EXIST

Every graph has a spanning tree with total stretch O(mlogn)

O(mlog2n) edges, too many!

∑e W(e)R’(e) = O(mlogn)

Hiding loglogn

More in 12 slides (again!)

Page 47: Fast Regression Algorithms Using Spectral Graph Theory

‘GOOD’ TREE???

Unit weight case:stretch ≥ 1 for all edges

`

Stretch = 1+1 = 2

Page 48: Fast Regression Algorithms Using Spectral Graph Theory

WHAT ARE WE MISSING?

• Need:• G ≺ H ≺ kG• n-1+O(mlogpn/k) edges

• Generated:• G ≺ H ≺ 2G• n-1+O(mlog2n) edges

` `Haven’t used k!

Page 49: Fast Regression Algorithms Using Spectral Graph Theory

USE K, SOMEHOW

• Tree is good!• Increase weights of

tree edges by factor of k

`

G ≺ G’ ≺ kG

Page 50: Fast Regression Algorithms Using Spectral Graph Theory

RESULT

• Tree heavier by factor of k• Tree effective resistance

decrease by factor of k

`

Stretch = 1/k+1/k = 2/k

Page 51: Fast Regression Algorithms Using Spectral Graph Theory

NOW SAMPLE?

Expected in H:Tree edges: n-1Off tree edges: O(mlog2n/k)

`

Total: n-1+O(mlog2n/k)

Page 52: Fast Regression Algorithms Using Spectral Graph Theory

BUT WE CHANGED G!

G ≺ G’ ≺ kGG’ ≺ H ≺ 2G’

`

G ≺ H≺ 2kG

Page 53: Fast Regression Algorithms Using Spectral Graph Theory

WHAT WE NEED: ULTRASPARSIFIERS

[Spielman-Teng `04]: ultrasparsifiers with n-1+O(mlogpn/k) edges imply solvers with O(mlogpn) running time.

• Given: G with n vertices, m edges parameter k• Output: H with n

vertices, n-1+O(mlogpn/k) edges• Goal: G ≺ H ≺ kG

` `

G ≺ H≺ 2kGn-1+O(mlog2n/k) edges

Page 54: Fast Regression Algorithms Using Spectral Graph Theory

• Input: Graph Laplacian G• Compute low stretch tree T of

G• T ( log2n) T • H G + T • H SampleT(H)• Solve G by iterating on H and

solving recursively, but reuse T

PSEUDOCODE OF O(MLOGN) SOLVER

Page 55: Fast Regression Algorithms Using Spectral Graph Theory

EXTENSIONS / GENERALIZATIONS

• [Koutis-Levin-P `12]: sparsify mildly dense graphs in O(m) time• [Miller-P `12]: general matrices: find ‘simpler’ matrix that’s similar in O(m+n2.38+a) time.

` `

Page 56: Fast Regression Algorithms Using Spectral Graph Theory

SUMMARY OF SOLVERS

• Spectral graph theory allows one to find similar, easier to solve graphs• Backbone: good trees

` `

Page 57: Fast Regression Algorithms Using Spectral Graph Theory

SOLVERS USING GRAPH THEORY

Fast solvers for graph Laplacians use combinatorial graph theory

Ax=b

Page 58: Fast Regression Algorithms Using Spectral Graph Theory

OUTLINE

• Regression: why and how• Spectra: linear system solvers•Graphs: tree embeddings

Page 59: Fast Regression Algorithms Using Spectral Graph Theory

LOW STRETCH SPANNING TREE

Sampling probability: edge weight times effective resistance of tree path Unit weight case: length of tree path

Low stretch spanning tree: small total stretch

Page 60: Fast Regression Algorithms Using Spectral Graph Theory

DIFFERENT THAN USUAL TREES

n1/2-by-n1/2 unit weighted mesh

stretch(e)= O(1)

total stretch = Ω(n3/2)

stretch(e)=Ω(n1/2)

‘haircomb’ is both shortest path and max weight spanning tree

Page 61: Fast Regression Algorithms Using Spectral Graph Theory

A BETTER TREE FOR THE GRID

Recursive ‘C’

Page 62: Fast Regression Algorithms Using Spectral Graph Theory

LOW STRETCH SPANNING TREES

[Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08]:Any graph has a spanning tree with total stretch O(mlogn)

Hiding loglogn

Page 63: Fast Regression Algorithms Using Spectral Graph Theory

ISSUE: RUNNING TIME

Algorithms given by[Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08]take O(nlog2n+mlogn) time

Reason: O(logn) shortest paths

Page 64: Fast Regression Algorithms Using Spectral Graph Theory

SPEED UP

[Koutis-Miller-P `11]:• Round edge weights to powers of

2• k=logn, total work = O(mlogn)

[Orlin-Madduri-Subramani-Williamson `10]:Shortest path on graphs with k distinct weights can run in O(mlogm/nk) time

Hiding loglogn, we actually improve these

Page 65: Fast Regression Algorithms Using Spectral Graph Theory

• [Blelloch-Gupta-Koutis-Miller-P-Tangwongsan. `11]: current framework parallelizes to O(m1/3+a) depth• Combine with Laplacian paradigm fast parallel graph algorithms

` `

PARALLEL ALGORITHM?

Page 66: Fast Regression Algorithms Using Spectral Graph Theory

• Before this work: parallel time > state of the art sequential time

• Our result: parallel work close to sequential, and O(m2/3) time

PARALLEL GRAPH ALGORITHMS?

Page 67: Fast Regression Algorithms Using Spectral Graph Theory

FUNDAMENTAL PROBLEM

Long standing open problem: theoretical speedups for BFS / shortest path in directed graphs

Sequential algorithms are too fast!

Page 68: Fast Regression Algorithms Using Spectral Graph Theory

First step of framework by[Elkin-Emek-Spielman-Teng `05]:

` `

PARALLEL ALGORITHM?

shortest path

Page 69: Fast Regression Algorithms Using Spectral Graph Theory

•Workaround: use earlier algorithm by [Alon-Karp-Peleg-West `95]

• Idea: repeated clustering• Based on ideas from [Cohen `93, `00] for approximating shortest path

PARALLEL TREE EMBEDDING

Page 70: Fast Regression Algorithms Using Spectral Graph Theory

PARALLEL TREE EMBEDDING

Page 71: Fast Regression Algorithms Using Spectral Graph Theory

THE BIG PICTURE

•Need fast linear system solvers for graph regression•Need combinatorial graph algorithms for fast solvers

Ax=bminimize

Page 72: Fast Regression Algorithms Using Spectral Graph Theory

ONGOING / FUTURE WORK

• Better regression?• Faster/parallel solver?• Sparse approximate (pseudo) inverse?•Other types of systems?

Page 73: Fast Regression Algorithms Using Spectral Graph Theory

THANK YOU!

Questions?