Post on 02-Jan-2016
description
Fast Regression Algorithms Using Spectral Graph Theory
Richard Peng
OUTLINE
•Regression: why and how• Spectra: fast solvers•Graphs: tree embeddings
LEARNING / INFERENCE
Find (hidden) pattern in (noisy) data
Output:Input signal, s:
REGRESSION
• p ≥ 1: convex• Convex constraints
e.g. linear equalities
Mininimize: |x|p
Subject to: constraints on x
minimize
APPLICATION 0: LASSO
Widely used in practice:• Structured output• Robust to noise
[Tibshirani `96]:Min |x|1
s.t. Ax = s
Ax
APPLICATION 1: IMAGES
No bears were harmed in the making of these slides
Poisson image processing
MinΣi~j∈E(xi-xj-si~j)2
APPLICATION 2: MIN CUT
Remove fewest edges to separate vertices s and t
Min Σij∈E|xi-xj|
s.t. xs=0, xt=1
s t0 10
0 0
1
1 1
Fractional solution = integral solution
REGRESSION ALGORITHMS
Convex optimization• 1940~1960: simplex, tractable• 1960~1980: ellipsoid, poly time• 1980~2000: interior point,
efficientÕ(m1/2) interior steps
• m = # non-zeros• Õ hides log
factors
minimize
EFFICIENCY MATTERS
•m > 106 for most images• Even bigger (109):• Videos• 3D medical data
Õ(m1/2)
KEY SUBROUTINE
Each step of interior point algorithms finds a step direction
minimize Ax=b
Linear system solves
MORE REASONS FOR FAST SOLVERS
[Boyd-Vanderberghe `04], Figure 11.20:The growth in the average number of Newton iterations (on randomly generated SDPs)… is very small
LINEAR SYSTEM SOLVERS
• [1st century CE] Gaussian Elimination: O(m3)• [Strassen `69] O(m2.8)• [Coppersmith-Winograd `90]
O(m2.3755)• [Stothers `10] O(m2.3737)• [Vassilevska Williams`11]
O(m2.3727)
Total: > m2
NOT FAST NOT USED:
• Preferred in practice: coordinate descent, subgradient methods• Solution quality traded for time
FAST GRAPH BASED L2 REGRESSION[SPIELMAN-TENG ‘04]
Input: Linear system where A is related to graphs, bOutput: Solution to Ax=bRuntime: Nearly Linear, Õ(m)
Ax=bMore in 12 slides
GRAPHS USING ALGEBRA
Fast convergence+ Low cost per step= state of the art algorithms
Ax=b
LAPLACIAN PARADIGM
[Daitch-Spielman `08]: mincost fow[Christiano-Kelner-Mądry-Spielman-
Teng `11]:approx maximum flow /min cut
Ax=b
EXTENSION 1
[Chin-Mądry-Miller-P `12]:
regression, image processing, grouped L2
EXTENSION 2
[Kelner-Miller-P `12]: k-commodity flowDual: k-variate labeling of graphs
s t
EXTENSION 3
[Miller-P `13]: faster for structured images / separable graphs
NEED: FAST LINEAR SYSTEM SOLVERS
Implication of fast solvers:• Fast regression routines• Parallel, work efficient graph algorithms
minimize Ax=b
OTHER APPLICATIONS
• [Tutte `66]: planar embedding• [Boman-Hendrickson-Vavasis`04]: PDEs• [Orecchia-Sachedeva-Vishnoi`12]: balanced cut / graph separator
OUTLINE
• Regression: why and how•Spectra: Linear system solvers•Graphs: tree embeddings
PROBLEM
Given: matrix A, vector bSize of A:• n-by-n• m non-zeros
Ax=b
SPECIAL STRUCTURE OF A
A = Deg – Adj• Deg: diag(degree)• Adj: adjacency
matrix
[Gremban-Miller `96]: extensions to SDD matrices
`
Aij =deg(i) if i=jw(ij)
otherwise
UNSTRUCTURED GRAPHS
• Social network• Intermediate systems of other algorithms are almost adversarial
NEARLY LINEAR TIME SOLVERS[SPIELMAN-TENG ‘04]
Input: n by n graph Laplacian Awith m non-zeros,
vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.
|x-x’|A<ε|x|A
Runtime: Nearly Linear.O(m logcn log(1/ε)) expected
• runtime is cost per bit of accuracy.
• Error in the A-norm: |y|A=√yTAy.
HOW MANY LOGS
Runtime: O(mlogcn log(1/ ε))
Value of c: I don’t know
[Spielman]: c≤70
[Koutis]: c≤15
[Miller]: c≤32
[Teng]: c≤12
[Orecchia]: c≤6
When n = 106, log6n > 106
PRACTICAL NEARLY LINEAR TIME SOLVERS[KOUTIS-MILLER-P `10]
Input: n by n graph Laplacian Awith m non-zeros,
vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.
|x-x’|A<ε|x|A
Runtime: O(mlog2n log(1/ ε))• runtime is cost per bit of
accuracy.• Error in the A-norm: |y|A=√yTAy.
PRACTICAL NEARLY LINEAR TIME SOLVERS[KOUTIS-MILLER-P `11]
Input: n by n graph Laplacian Awith m non-zeros,
vector bWhere: b = Ax for some xOutput: Approximate solution x’ s.t.
|x-x’|A<ε|x|A
Runtime: O(mlogn log(1/ ε))• runtime is cost per bit of
accuracy.• Error in the A-norm: |y|A=√yTAy.
STAGES OF THE SOLVER
• Iterative Methods• Spectral Sparsifiers• Low Stretch Spanning Trees
ITERATIVE METHODS
Numerical analysis:Can solve systems in A by iteratively solving spectrally similar, but easier, B
WHAT IS SPECTRALLY SIMILAR?
A ≺ B ≺ kA for some small k
• Ideas from scalars hold!• A ≺ B: for any vector x,
|x|A2 < |x|B
2
[Vaidya `91]: Since A is a graph, B should be too!
[Vaidya `91]: Since G is a graph, H should be too!
`EASIER’ H
Goal: H with fewer edges that’s similar to G
Ways of easier:• Fewer vertices• Fewer edges
Can reduce vertex count if edge count is small
GRAPH SPARSIFIERS
Sparse equivalents of graphs that preserve something
• Spanners: distance, diameter.• Cut sparsifier: all cuts.•What we need: spectrum
WHAT WE NEED: ULTRASPARSIFIERS
[Spielman-Teng `04]: ultrasparsifiers with n-1+O(mlogpn/k) edges imply solvers with O(mlogpn) running time.
• Given: G with n vertices, m edges parameter k• Output: H with n
vertices, n-1+O(mlogpn/k) edges• Goal: G ≺ H ≺ kG
` `
EXAMPLE: COMPLETE GRAPH
O(nlogn) random edges (with scaling) suffice w.h.p.
GENERAL GRAPH SAMPLING MECHANISM
• For edge e, flip coin Pr(keep) = P(e)• Rescale to maintain expectation
Number of edges kept: ∑e P(e)
Also need to prove concentration
EFFECTIVE RESISTANCE
•View the graph as a circuit•R(u,v) = Pass 1 unit of current from u to v, measure resistance of circuit
`
EE101
Effective resistance in general:solve Gx = euv, where euv is indicator vector, R(u,v) = xu – xv.
`
(REMEDIAL?) EE101
•Single edge: R(e) = 1/w(e)•Series: R(u, v) = R(e1) + … + R(el)
`w1
`
u v
u v
w1 w2
R(u, v) = 1/w1
R(u, v) = 1/w1 + 1/w2
SPECTRAL SPARSIFICATION BY EFFECTIVE RESISTANCE
[Spielman-Srivastava `08]: Setting P(e) to W(e)R(u,v)O(logn) gives G ≺ H ≺ 2G*
*Ignoring probabilistic issues
[Foster `49]: ∑e W(e)R(e) = n-1Spectral sparsifier with O(nlogn) edges
Ultrasparsifier? Solver???
THE CHICKEN AND EGG PROBLEM
How to find effective resistance?
[Spielman-Srivastava `08]: use solver[Spielman-Teng `04]: need sparsifier
OUR WORK AROUND
•Use upper bounds of effective resistance, R’(u,v)•Modify the problem
RAYLEIGH’S MONOTONICITY LAW
Rayleigh’s Monotonicity Law: R(u, v) only increase when edges are removed
`
Calculate effective resistance w.r.t. a tree T
SAMPLING PROBABILITIES ACCORDING TO TREE
Sample Probability: edge weight times effective resistance of tree path
`
Goal: small total stretch
stretch
GOOD TREES EXIST
Every graph has a spanning tree with total stretch O(mlogn)
O(mlog2n) edges, too many!
∑e W(e)R’(e) = O(mlogn)
Hiding loglogn
More in 12 slides (again!)
‘GOOD’ TREE???
Unit weight case:stretch ≥ 1 for all edges
`
Stretch = 1+1 = 2
WHAT ARE WE MISSING?
• Need:• G ≺ H ≺ kG• n-1+O(mlogpn/k) edges
• Generated:• G ≺ H ≺ 2G• n-1+O(mlog2n) edges
` `Haven’t used k!
USE K, SOMEHOW
• Tree is good!• Increase weights of
tree edges by factor of k
`
G ≺ G’ ≺ kG
RESULT
• Tree heavier by factor of k• Tree effective resistance
decrease by factor of k
`
Stretch = 1/k+1/k = 2/k
NOW SAMPLE?
Expected in H:Tree edges: n-1Off tree edges: O(mlog2n/k)
`
Total: n-1+O(mlog2n/k)
BUT WE CHANGED G!
G ≺ G’ ≺ kGG’ ≺ H ≺ 2G’
`
G ≺ H≺ 2kG
WHAT WE NEED: ULTRASPARSIFIERS
[Spielman-Teng `04]: ultrasparsifiers with n-1+O(mlogpn/k) edges imply solvers with O(mlogpn) running time.
• Given: G with n vertices, m edges parameter k• Output: H with n
vertices, n-1+O(mlogpn/k) edges• Goal: G ≺ H ≺ kG
` `
G ≺ H≺ 2kGn-1+O(mlog2n/k) edges
• Input: Graph Laplacian G• Compute low stretch tree T of
G• T ( log2n) T • H G + T • H SampleT(H)• Solve G by iterating on H and
solving recursively, but reuse T
PSEUDOCODE OF O(MLOGN) SOLVER
EXTENSIONS / GENERALIZATIONS
• [Koutis-Levin-P `12]: sparsify mildly dense graphs in O(m) time• [Miller-P `12]: general matrices: find ‘simpler’ matrix that’s similar in O(m+n2.38+a) time.
` `
SUMMARY OF SOLVERS
• Spectral graph theory allows one to find similar, easier to solve graphs• Backbone: good trees
` `
SOLVERS USING GRAPH THEORY
Fast solvers for graph Laplacians use combinatorial graph theory
Ax=b
OUTLINE
• Regression: why and how• Spectra: linear system solvers•Graphs: tree embeddings
LOW STRETCH SPANNING TREE
Sampling probability: edge weight times effective resistance of tree path Unit weight case: length of tree path
Low stretch spanning tree: small total stretch
DIFFERENT THAN USUAL TREES
n1/2-by-n1/2 unit weighted mesh
stretch(e)= O(1)
total stretch = Ω(n3/2)
stretch(e)=Ω(n1/2)
‘haircomb’ is both shortest path and max weight spanning tree
A BETTER TREE FOR THE GRID
Recursive ‘C’
LOW STRETCH SPANNING TREES
[Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08]:Any graph has a spanning tree with total stretch O(mlogn)
Hiding loglogn
ISSUE: RUNNING TIME
Algorithms given by[Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08]take O(nlog2n+mlogn) time
Reason: O(logn) shortest paths
SPEED UP
[Koutis-Miller-P `11]:• Round edge weights to powers of
2• k=logn, total work = O(mlogn)
[Orlin-Madduri-Subramani-Williamson `10]:Shortest path on graphs with k distinct weights can run in O(mlogm/nk) time
Hiding loglogn, we actually improve these
• [Blelloch-Gupta-Koutis-Miller-P-Tangwongsan. `11]: current framework parallelizes to O(m1/3+a) depth• Combine with Laplacian paradigm fast parallel graph algorithms
` `
PARALLEL ALGORITHM?
• Before this work: parallel time > state of the art sequential time
• Our result: parallel work close to sequential, and O(m2/3) time
PARALLEL GRAPH ALGORITHMS?
FUNDAMENTAL PROBLEM
Long standing open problem: theoretical speedups for BFS / shortest path in directed graphs
Sequential algorithms are too fast!
First step of framework by[Elkin-Emek-Spielman-Teng `05]:
` `
PARALLEL ALGORITHM?
shortest path
•Workaround: use earlier algorithm by [Alon-Karp-Peleg-West `95]
• Idea: repeated clustering• Based on ideas from [Cohen `93, `00] for approximating shortest path
PARALLEL TREE EMBEDDING
PARALLEL TREE EMBEDDING
THE BIG PICTURE
•Need fast linear system solvers for graph regression•Need combinatorial graph algorithms for fast solvers
Ax=bminimize
ONGOING / FUTURE WORK
• Better regression?• Faster/parallel solver?• Sparse approximate (pseudo) inverse?•Other types of systems?
THANK YOU!
Questions?