Fast Regression Algorithms Using Spectral Graph Theory

Richard Peng

OUTLINE

•Regression: why and how• Spectra: fast solvers•Graphs: tree embeddings

LEARNING / INFERENCE

Find (hidden) pattern in (noisy) data

Output:Input signal, s:

REGRESSION

• p ≥ 1: convex• Convex constraints

e.g. linear equalities

Mininimize: |x|p

Subject to: constraints on x

minimize

APPLICATION 0: LASSO

Widely used in practice:• Structured output• Robust to noise

[Tibshirani `96]:Min |x|1

s.t. Ax = s

APPLICATION 1: IMAGES

No bears were harmed in the making of these slides

Poisson image processing

MinΣi~j∈E(xi-xj-si~j)2

APPLICATION 2: MIN CUT

Remove fewest edges to separate vertices s and t

Min Σij∈E|xi-xj|

s.t. xs=0, xt=1

s t0 10

Fractional solution = integral solution

REGRESSION ALGORITHMS

Convex optimization• 1940~1960: simplex, tractable• 1960~1980: ellipsoid, poly time• 1980~2000: interior point,

efficientÕ(m1/2) interior steps

• m = # non-zeros• Õ hides log

factors

minimize

EFFICIENCY MATTERS

•m > 106 for most images• Even bigger (109):• Videos• 3D medical data

Õ(m1/2)

KEY SUBROUTINE

Each step of interior point algorithms finds a step direction

minimize Ax=b

Linear system solves

MORE REASONS FOR FAST SOLVERS

[Boyd-Vanderberghe `04], Figure 11.20:The growth in the average number of Newton iterations (on randomly generated SDPs)… is very small

LINEAR SYSTEM SOLVERS

• [1st century CE] Gaussian Elimination: O(m3)• [Strassen `69] O(m2.8)• [Coppersmith-Winograd `90]

O(m2.3755)• [Stothers `10] O(m2.3737)• [Vassilevska Williams`11]

O(m2.3727)

Total: > m2

NOT FAST NOT USED:

• Preferred in practice: coordinate descent, subgradient methods• Solution quality traded for time

FAST GRAPH BASED L2 REGRESSION[SPIELMAN-TENG ‘04]

Input: Linear system where A is related to graphs, bOutput: Solution to Ax=bRuntime: Nearly Linear, Õ(m)

Ax=bMore in 12 slides

GRAPHS USING ALGEBRA

Fast convergence+ Low cost per step= state of the art algorithms

LAPLACIAN PARADIGM

[Daitch-Spielman `08]: mincost fow[Christiano-Kelner-Mądry-Spielman-

Teng `11]:approx maximum flow /min cut

EXTENSION 1

[Chin-Mądry-Miller-P `12]:

regression, image processing, grouped L2

EXTENSION 2

[Kelner-Miller-P `12]: k-commodity flowDual: k-variate labeling of graphs

EXTENSION 3

[Miller-P `13]: faster for structured images / separable graphs

NEED: FAST LINEAR SYSTEM SOLVERS

Implication of fast solvers:• Fast regression routines• Parallel, work efficient graph algorithms

minimize Ax=b

OTHER APPLICATIONS

• [Tutte `66]: planar embedding• [Boman-Hendrickson-Vavasis`04]: PDEs• [Orecchia-Sachedeva-Vishnoi`12]: balanced cut / graph separator

OUTLINE

• Regression: why and how•Spectra: Linear system solvers•Graphs: tree embeddings

PROBLEM

Given: matrix A, vector bSize of A:• n-by-n• m non-zeros

SPECIAL STRUCTURE OF A

A = Deg – Adj• Deg: diag(degree)• Adj: adjacency

matrix

[Gremban-Miller `96]: extensions to SDD matrices

Aij =deg(i) if i=jw(ij)

otherwise

UNSTRUCTURED GRAPHS

• Social network• Intermediate systems of other algorithms are almost adversarial

NEARLY LINEAR TIME SOLVERS[SPIELMAN-TENG ‘04]

Input: n by n graph Laplacian Awith m non-zeros,

vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.

|x-x’|A<ε|x|A

Runtime: Nearly Linear.O(m logcn log(1/ε)) expected

• runtime is cost per bit of accuracy.

• Error in the A-norm: |y|A=√yTAy.

HOW MANY LOGS

Runtime: O(mlogcn log(1/ ε))

Value of c: I don’t know

[Spielman]: c≤70

[Koutis]: c≤15

[Miller]: c≤32

[Teng]: c≤12

[Orecchia]: c≤6

When n = 106, log6n > 106

PRACTICAL NEARLY LINEAR TIME SOLVERS[KOUTIS-MILLER-P `10]

|x-x’|A<ε|x|A

Runtime: O(mlog2n log(1/ ε))• runtime is cost per bit of

accuracy.• Error in the A-norm: |y|A=√yTAy.

PRACTICAL NEARLY LINEAR TIME SOLVERS[KOUTIS-MILLER-P `11]

|x-x’|A<ε|x|A

Runtime: O(mlogn log(1/ ε))• runtime is cost per bit of

accuracy.• Error in the A-norm: |y|A=√yTAy.

STAGES OF THE SOLVER

• Iterative Methods• Spectral Sparsifiers• Low Stretch Spanning Trees

ITERATIVE METHODS

Numerical analysis:Can solve systems in A by iteratively solving spectrally similar, but easier, B

WHAT IS SPECTRALLY SIMILAR?

A ≺ B ≺ kA for some small k

• Ideas from scalars hold!• A ≺ B: for any vector x,

|x|A2 < |x|B

[Vaidya `91]: Since A is a graph, B should be too!

[Vaidya `91]: Since G is a graph, H should be too!

`EASIER’ H

Goal: H with fewer edges that’s similar to G

Ways of easier:• Fewer vertices• Fewer edges

Can reduce vertex count if edge count is small

GRAPH SPARSIFIERS

Sparse equivalents of graphs that preserve something

• Spanners: distance, diameter.• Cut sparsifier: all cuts.•What we need: spectrum

WHAT WE NEED: ULTRASPARSIFIERS

[Spielman-Teng `04]: ultrasparsifiers with n-1+O(mlogpn/k) edges imply solvers with O(mlogpn) running time.

• Given: G with n vertices, m edges parameter k• Output: H with n

vertices, n-1+O(mlogpn/k) edges• Goal: G ≺ H ≺ kG

EXAMPLE: COMPLETE GRAPH

O(nlogn) random edges (with scaling) suffice w.h.p.

GENERAL GRAPH SAMPLING MECHANISM

• For edge e, flip coin Pr(keep) = P(e)• Rescale to maintain expectation

Number of edges kept: ∑e P(e)

Also need to prove concentration

EFFECTIVE RESISTANCE

•View the graph as a circuit•R(u,v) = Pass 1 unit of current from u to v, measure resistance of circuit

Effective resistance in general:solve Gx = euv, where euv is indicator vector, R(u,v) = xu – xv.

(REMEDIAL?) EE101

•Single edge: R(e) = 1/w(e)•Series: R(u, v) = R(e1) + … + R(el)

R(u, v) = 1/w1

R(u, v) = 1/w1 + 1/w2

SPECTRAL SPARSIFICATION BY EFFECTIVE RESISTANCE

[Spielman-Srivastava `08]: Setting P(e) to W(e)R(u,v)O(logn) gives G ≺ H ≺ 2G*

*Ignoring probabilistic issues

[Foster `49]: ∑e W(e)R(e) = n-1Spectral sparsifier with O(nlogn) edges

Ultrasparsifier? Solver???

THE CHICKEN AND EGG PROBLEM

How to find effective resistance?

[Spielman-Srivastava `08]: use solver[Spielman-Teng `04]: need sparsifier

OUR WORK AROUND

•Use upper bounds of effective resistance, R’(u,v)•Modify the problem

RAYLEIGH’S MONOTONICITY LAW

Rayleigh’s Monotonicity Law: R(u, v) only increase when edges are removed

Calculate effective resistance w.r.t. a tree T

SAMPLING PROBABILITIES ACCORDING TO TREE

Sample Probability: edge weight times effective resistance of tree path

Goal: small total stretch

stretch

GOOD TREES EXIST

Every graph has a spanning tree with total stretch O(mlogn)

O(mlog2n) edges, too many!

∑e W(e)R’(e) = O(mlogn)

Hiding loglogn

Fast Regression Algorithms Using Spectral Graph Theory

Documents

Transcript of Fast Regression Algorithms Using Spectral Graph Theory

Parallel Spectral Graph Partitioning

Spectral Graph TheorySpectral Graph Theory Social and Technological Networks Rik Sarkar University of Edinburgh, 2018. Spectral methods • Understanding a graph using eigenvalues

Spectral Graph Theory - University of Connecticut › ... › SpecGraphTheory.pdf · D. J. Kelleher Spectral graph theory. Spectral Theorem Spectral Theorem If Ais a real symmetric

Deep Graph Spectral Evolution Networks for Graph ...

Spectral Graph Theory

Spectral Graph Theory (Basics)

Graph spectral techniques in computer sciences

Group-Sparse Regression With Applications in Spectral ...lup.lub.lu.se/search/ws/files/31461074/Kronvall17_print.pdf · Group-Sparse Regression With Applications in Spectral Analysis

A REGRESSION MODEL FOR ESTIMATING POWER SPECTRAL …

Chapter 2 Spectral Graph Drawing - University of …cis515/cis515-15-spectral-clust-chap2.pdf · Chapter 2 Spectral Graph Drawing 2.1 Graph Drawing and Energy Minimization Let G =(V,E)besomeundirectedgraph.Itisoftende-sirable

Approximate Graph Spectral Decomposition with the ...stanford.edu/~joshp007/SGT_VQE.pdfApproximate Graph Spectral Decomposition with the Variational Quantum Eigensolver Josh Payne

Spectral Graph Clustering

Algorithm Design Using Spectral Graph Theory

Chapter 18 Spectral Graph Drawingcis515/cis515-15-graph-drawing.pdf · Spectral Graph Drawing 18.1 Graph Drawing and Energy Minimization Let G =(V,E)besomeundirectedgraph.Itisoftende-sirable

Spectral Graph Theory - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/... · Spectral Graph Theory I Appeared as a branch of algebraic graph theory in the 1950s and 1960s. I Early

Graph Partitioning II: Spectral Methods - Carlos …Spectral graph theory The study of the eigenvalues and eigenvectors of a graph matrix – Adjacency matrix – Laplacian matrix

Chapter 18 Spectral Graph Drawingcis515/cis515-14-graph-drawing.pdf · Spectral Graph Drawing 18.1 Graph Drawing and Energy Minimization Let G =(V,E)besomeundirectedgraph.Itisoftende-sirable

Spectral Segmentation with Multiscale Graph Decompositionjshi/papers/multiscale-paper.pdf · Spectral Segmentation with Multiscale Graph Decomposition Timothee Cour´ 1 Florence Ben´

Spectral Regression dimension reduction for multiple ... · Spectral Regression dimension reduction for multiple features facial ... ‘Spectral Regression dimension reduction for

Introduction to Spectral Graph Theory and Graph Clusteringweb.cs.ucdavis.edu/~bai/ECS231/ho_clustering.pdf · Introduction to Spectral Graph Theory and Graph Clustering Chengming