Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering...
Transcript of Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering...
![Page 1: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/1.jpg)
Linear Sketching and Applications to DistributedComputation
Cameron Musco
November 7, 2014
![Page 2: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/2.jpg)
Overview
Linear Sketches
I Johnson-Lindenstrauss Lemma
I Heavy-Hitters
Applications
I k-Means Clustering
I Spectral sparsification
![Page 3: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/3.jpg)
Overview
Linear Sketches
I Johnson-Lindenstrauss Lemma
I Heavy-Hitters
Applications
I k-Means Clustering
I Spectral sparsification
![Page 4: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/4.jpg)
Linear Sketching
I Randomly choose Π ∼ D
![Page 5: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/5.jpg)
Linear Sketching
I Oblivious: Π chosen independently of x.
I Composable: Π(x + y) = Πx + Πy.
![Page 6: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/6.jpg)
Linear Sketching
I Oblivious: Π chosen independently of x.
I Composable: Π(x + y) = Πx + Πy.
![Page 7: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/7.jpg)
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
![Page 8: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/8.jpg)
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
![Page 9: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/9.jpg)
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
![Page 10: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/10.jpg)
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
![Page 11: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/11.jpg)
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
![Page 12: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/12.jpg)
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
![Page 13: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/13.jpg)
Linear Sketching
Two Main Tools
I Johnson Lindenstrauss sketches (randomized dimensionalityreduction, subspace embedding, etc..)
I Heavy-Hitters sketches (sparse recovery, compressive sensing,`p sampling, graph sketching, etc....)
![Page 14: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/14.jpg)
Johnson-Lindenstrauss Lemma
I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)
I ‖Πx‖22 ≈ε ‖x‖22.
1√m
N (0, 1) ... N (0, 1)...
...N (0, 1) ... N (0, 1)
x1x2......xn
=1√m
N (0, x21 + ...+ x2n )...
N (0, x21 + ...+ x2n )
![Page 15: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/15.jpg)
Johnson-Lindenstrauss Lemma
I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)
I ‖Πx‖22 ≈ε ‖x‖22.
1√m
N (0, 1) ... N (0, 1)...
...N (0, 1) ... N (0, 1)
x1x2......xn
=1√m
N (0, x21 + ...+ x2n )...
N (0, x21 + ...+ x2n )
![Page 16: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/16.jpg)
Johnson-Lindenstrauss Lemma
I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)
I ‖Πx‖22 ≈ε ‖x‖22.
1√m
N (0, 1) ... N (0, 1)...
...N (0, 1) ... N (0, 1)
x1x2......xn
=1√m
N (0, ‖x‖22)...
N (0, ‖x‖22)
‖Πx‖22 =1
m
m∑i=1
N (0, ‖x‖22)2 ≈ε ‖x‖22
![Page 17: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/17.jpg)
Johnson-Lindenstrauss Lemma
I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)
I ‖Πx‖22 ≈ε ‖x‖22.
1√m
N (0, 1) ... N (0, 1)...
...N (0, 1) ... N (0, 1)
x1x2......xn
=1√m
N (0, ‖x‖22)...
N (0, ‖x‖22)
‖Πx‖22 =1
m
m∑i=1
N (0, ‖x‖22)2 ≈ε ‖x‖22
![Page 18: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/18.jpg)
Johnson-Lindenstrauss Lemma
That’s it! - basic statistics
I Sparse constructions.
I ±1 replace Gaussians
I Small sketch representation - i.e. small random seed(otherwise storing Π takes more space than storing x)
![Page 19: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/19.jpg)
Heavy-Hitters
I Count sketch, sparse recovery, `p sampling, point query, graphsketching, sparse fourier transform
I Simple idea: Hashing
![Page 20: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/20.jpg)
Heavy-Hitters
I Random signs to deal with negative entries
I Repeat many times ‘decode’ heavy buckets
1 0 0 ... −1 0...
...0 −1 0 ... 0 1
x1x2......xn
=
c1...
c1/ε
...
1 0 0 ... −1 0...
...0 −1 0 ... 0 1
x1x2......xn
=
...c5...
c1/ε
![Page 21: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/21.jpg)
Heavy-Hitters
I Random signs to deal with negative entries
I Repeat many times ‘decode’ heavy buckets
c1......
c1/ε
,
...c5...
c1/ε
,
...c20
...c1/ε
,
...
...
...c1/ε
I h1(2) = 1, h2(2) = 5, h3(2) = 20, h4(2) = 1/ε→ x22
‖x‖22≥ 1
ε
![Page 22: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/22.jpg)
Heavy-Hitters
I Just random encoding
I polylog(n) to recover entires with 1log(n) of the total norm
I Basically best we could hope for.
I Random scalings gives `p sampling.
![Page 23: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/23.jpg)
Application 1: k-means Clustering
I Assign points to k clusters
I k is fixed
I Minimize distance to centroids:∑n
i=1 ‖ai − µC(i)‖22
![Page 24: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/24.jpg)
Application 1: k-means ClusteringLloyd’s algorithm aka the ‘k-means algorithm’
I Initalize random clustersI Compute centroidsI Assign each point to closest centroidI Repeat
![Page 25: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/25.jpg)
Application 1: k-means ClusteringLloyd’s algorithm aka the ‘k-means algorithm’
I Initalize random clustersI Compute centroidsI Assign each point to closest centroidI Repeat
![Page 26: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/26.jpg)
Application 1: k-means Clustering
What if data is distributed?
![Page 27: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/27.jpg)
Application 1: k-means ClusteringI At each iteration each server sends out new local centroidsI Adding them together, gives the new global centroids.I O(sdk) communication per iteration
![Page 28: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/28.jpg)
Application 1: k-means Clustering
Can we do better?
I Balcan et al. 2013 - O((kd + sk)d)
I Locally computable O(kd + sk) sized coreset.
I All data is aggregated and k-means performed on singleserver.
![Page 29: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/29.jpg)
Can we do even better?
I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .
I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)
I CEMMP ’14: improved O(k/ε2) to dk/εe.
![Page 30: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/30.jpg)
Can we do even better?
I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .
I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)
I CEMMP ’14: improved O(k/ε2) to dk/εe.
![Page 31: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/31.jpg)
Can we do even better?
I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .
I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)
I CEMMP ’14: improved O(k/ε2) to dk/εe.
![Page 32: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/32.jpg)
Can we do even better?
I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .
I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)
I CEMMP ’14: improved O(k/ε2) to dk/εe.
![Page 33: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/33.jpg)
Can we do even better?
I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .
I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)
I CEMMP ’14: improved O(k/ε2) to dk/εe.
![Page 34: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/34.jpg)
Application 1: k-means Clustering
Can we do better?
I O(sdk) inherent in communicating O(k) singular vectors ofdimension d to s servers.
I Apply Johnson-Lindenstrauss!
I Goal: Minimize distance to centroids:∑n
i=1 ‖ai − µC(i)‖22
I Equivalently all pairs distances:∑k
i=11|Ci |
∑(j ,k)∈Ci
‖ai − aj‖22
![Page 35: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/35.jpg)
Application 1: k-means Clustering
Can we do better?
I O(sdk) inherent in communicating O(k) singular vectors ofdimension d to s servers.
I Apply Johnson-Lindenstrauss!
I Goal: Minimize distance to centroids:∑n
i=1 ‖ai − µC(i)‖22
I Equivalently all pairs distances:∑k
i=11|Ci |
∑(j ,k)∈Ci
‖ai − aj‖22
![Page 36: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/36.jpg)
Application 1: k-means Clustering
I ‖ai − aj‖22 ≈ε ‖(ai − aj)Π‖22 = ‖aiΠ− ajΠ‖22I O(n2) distance vectors so set failure probability δ = 1
100∗n2 .
I Π needs O(log 1/δ/ε2) = O(log n/ε2) dimensions
![Page 37: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/37.jpg)
Application 1: k-means Clustering
Immediately distributes - just need to share randomnessspecifying Π.
![Page 38: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/38.jpg)
Application 1: k-means Clustering
Our Paper [Cohen Elder Musco Musco Persu 14]
I Show that Π only needs to have O(k/ε2) columns
I Almost completely removes dependence on input size!
I O(k3 + sk2 + log d) - log d gets swallowed in the word size.
![Page 39: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/39.jpg)
Application 1: k-means Clustering
Highest Level Idea for how this works
I Show that the cost of projecting the columns AΠ to anyk-dimensional subspace approximates the cost of projecting Ato that subspace.
I Note that k-means can actually be viewed as a columnprojection problem.
I k-means clustering is ‘constrained’ PCA
I Lots of applications aside from k-means clustering.
![Page 40: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/40.jpg)
Application 1: k-means Clustering
Open Questions
I (9 + ε)-approximation with only O(log k) dimensions! What isthe right answer?
I We use O(kd + sk) sized coresets blackbox and reduce d .Can we use our linear algebraic understanding to improvecoreset constructions? I feels like we should be able to.
I These algorithms should be practical. I think testing them outwould be useful - for both k-means and PCA.
I Other problems (spectral clustering, SVM, what do peopleactually do?)
![Page 41: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/41.jpg)
Application 2: Spectral Sparsification
General Idea
I Approximate a dense graph with a much sparser graph.
I Reduce O(n2) edges → O(n log n) edges
![Page 42: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/42.jpg)
Application 2: Spectral Sparsification
General Idea
I Approximate a dense graph with a much sparser graph.
I Reduce O(n2) edges → O(n log n) edges
![Page 43: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/43.jpg)
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
![Page 44: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/44.jpg)
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
![Page 45: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/45.jpg)
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
![Page 46: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/46.jpg)
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
![Page 47: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/47.jpg)
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
![Page 48: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/48.jpg)
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
![Page 49: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/49.jpg)
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
![Page 50: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/50.jpg)
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
![Page 51: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/51.jpg)
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
![Page 52: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/52.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 53: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/53.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 54: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/54.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 55: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/55.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 56: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/56.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 57: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/57.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -0e34 0 0 0 0
B
×
1100x
=
010110
![Page 58: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/58.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 59: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/59.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 60: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/60.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 61: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/61.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 62: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/62.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 63: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/63.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 64: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/64.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
![Page 65: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/65.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96) So, ‖Bx‖22 = cut value.
GoalFind some B such that, for all x ∈ {0, 1}n,
(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22
I x>B>Bx ≈ x>B>Bx.
I L = B>B is the graph Laplacian.
![Page 66: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/66.jpg)
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96) So, ‖Bx‖22 = cut value.
GoalFind some B such that, for all x ∈ {0, 1}n,
(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22
I x>B>Bx ≈ x>B>Bx.
I L = B>B is the graph Laplacian.
![Page 67: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/67.jpg)
Graph Sparsification
Spectral Sparsification (Spielman, Teng ’04)
GoalFind some B such that, for all x ∈ {0, 1}n Rn,
(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22
Applications: Anything cut sparsifiers can do, Laplacian systemsolves, computing effective resistances, spectral clustering,calculating random walk properties, etc.
![Page 68: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/68.jpg)
Graph Sparsification
Spectral Sparsification (Spielman, Teng ’04)
GoalFind some B such that, for all x ∈ {0, 1}n Rn,
(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22
Applications: Anything cut sparsifiers can do, Laplacian systemsolves, computing effective resistances, spectral clustering,calculating random walk properties, etc.
![Page 69: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/69.jpg)
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
![Page 70: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/70.jpg)
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
![Page 71: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/71.jpg)
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
![Page 72: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/72.jpg)
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
![Page 73: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/73.jpg)
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
![Page 74: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/74.jpg)
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
![Page 75: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/75.jpg)
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
![Page 76: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/76.jpg)
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
![Page 77: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/77.jpg)
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
![Page 78: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/78.jpg)
Graph Sparsification
How are sparsifiers constructed?
Sampling probabilities:
I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].
I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].
Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).
![Page 79: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/79.jpg)
Graph Sparsification
How are sparsifiers constructed?
Sampling probabilities:
I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].
I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].
Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).
![Page 80: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/80.jpg)
Graph Sparsification
How are sparsifiers constructed?
Sampling probabilities:
I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].
I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].
Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).
![Page 81: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/81.jpg)
Application 2: Spectral Sparsification
Highest Level Idea Of Our Approach
![Page 82: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/82.jpg)
Application 2: Spectral Sparsification
Why?
I Semi-streaming model with insertions and deletions
I Near optimal oblivious graph compression
I Distributed Graph Computations
![Page 83: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/83.jpg)
Application 2: Spectral SparsificationDistributed Graph Computation
I Trinity, Pregel, Giraph
![Page 84: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/84.jpg)
Application 2: Spectral SparsificationDistributed Graph Computation
I Trinity, Pregel, Giraph
![Page 85: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/85.jpg)
Application 2: Spectral SparsificationDistributed Graph Computation
I Trinity, Pregel, Giraph
![Page 86: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/86.jpg)
Application 2: Spectral SparsificationDistributed Graph Computation
I Trinity, Pregel, Giraph
![Page 87: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/87.jpg)
Application 2: Spectral Sparsification
I Naive to share my data: O(|Vi |n)
I With sketching: O(|Vi | logc n)
![Page 88: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/88.jpg)
Application 2: Spectral SparsificationAlternatives to Sketching?
I Simulate message passing algorithms over the nodes - this iswhat’s done in practice.
![Page 89: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/89.jpg)
Application 2: Spectral Sparsification
Alternatives to Sketching?
I Koutis ’14 gives distributed algorithm for spectralsparsification
I Iteratively computes O(log n) spanners (alternatively, lowstretch trees) to upper bound effective resistances and sampleedges.
I Combinatorial and local
![Page 90: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/90.jpg)
Application 2: Spectral Sparsification
I Cost per spanner: O(log2 n) rounds, O(m log n) messages,O(log n) message size.
I If simulating, each server sends O(δ(Vi ) log n) per round.
I O(δ(Vi ) log n) beats our bound of O(|Vi | log n) iffδ(Vi ) ≤ |Vi |
I But in that case, just keep all your outgoing edges andsparsify locally! At worst adds n edges to the final sparsifier.
![Page 91: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/91.jpg)
Application 2: Spectral Sparsification
Moral of That Story?
I I’m not sure.
I Sparsifiers are very strong. Could we do better for otherproblems?
I Can we reduce communication of simulated distributedprotocols using sparsifiers?
I What other things can sketches be applied to? Biggest openquestion is distances - spanners, etc.
![Page 92: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/92.jpg)
Sketching a Sparsifier
We are still going to sample by effective resistance.
I Treat graph as resistor network, each edge has resistance 1.
I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.
![Page 93: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/93.jpg)
Sketching a Sparsifier
We are still going to sample by effective resistance.
I Treat graph as resistor network, each edge has resistance 1.
I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.
![Page 94: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/94.jpg)
Sketching a Sparsifier
We are still going to sample by effective resistance.
I Treat graph as resistor network, each edge has resistance 1.
I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.
![Page 95: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/95.jpg)
Sketching a Sparsifier
Using standard V = IR equations:
If xe =
100-10
, e’s effective resistance is τe = x>e L−1xe .
![Page 96: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/96.jpg)
Sketching a Sparsifier
Using standard V = IR equations:
If xe =
100-10
, e’s effective resistance is τe = x>e L−1xe .
![Page 97: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/97.jpg)
Sketching a Sparsifier
Using standard V = IR equations:
If xe =
100-10
, e’s effective resistance is τe = x>e L−1xe .
![Page 98: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/98.jpg)
Sketching a Sparsifier
Effective resistance of edge e is τe = x>e L−1xe .
Alternatively, τe is the eth entry in the vector:
BL−1xe
AND
τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22
![Page 99: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/99.jpg)
Sketching a Sparsifier
Effective resistance of edge e is τe = x>e L−1xe .
Alternatively, τe is the eth entry in the vector:
BL−1xe
AND
τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22
![Page 100: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/100.jpg)
Sketching a Sparsifier
Effective resistance of edge e is τe = x>e L−1xe .
Alternatively, τe is the eth entry in the vector:
BL−1xe
AND
τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22
![Page 101: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/101.jpg)
Sketching a Sparsifier
We just need two more ingredients: BL−1xe
`2 Heavy Hitters [GLPS10]:
I Sketch vector poly(n) vector in polylog(n) space.
I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.
Coarse Sparsifier:
I L such that x>Lx = (1± constant)x>Lx
![Page 102: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/102.jpg)
Sketching a Sparsifier
We just need two more ingredients: BL−1xe
`2 Heavy Hitters [GLPS10]:
I Sketch vector poly(n) vector in polylog(n) space.
I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.
Coarse Sparsifier:
I L such that x>Lx = (1± constant)x>Lx
![Page 103: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/103.jpg)
Sketching a Sparsifier
We just need two more ingredients: BL−1xe
`2 Heavy Hitters [GLPS10]:
I Sketch vector poly(n) vector in polylog(n) space.
I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.
Coarse Sparsifier:
I L such that x>Lx = (1± constant)x>Lx
![Page 104: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/104.jpg)
Sketching a Sparsifier
Putting it all together: BL−1xe
1. Sketch (Πheavy hitters)B in n logc n space.
2. Compute (Πheavy hitters)BL−1.
3. For every possible edge e, compute (Πheavy hitters)BL−1xe
4. Extract heavy hitters from the vector, check if eth entry is one.
BL−1xe(e)2
‖BL−1xe‖22≈ τ2eτe
= τe
So, as long as τe > O(1/ log n), we will recover the edge!
![Page 105: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/105.jpg)
Sketching a Sparsifier
Putting it all together: BL−1xe
1. Sketch (Πheavy hitters)B in n logc n space.
2. Compute (Πheavy hitters)BL−1.
3. For every possible edge e, compute (Πheavy hitters)BL−1xe
4. Extract heavy hitters from the vector, check if eth entry is one.
BL−1xe(e)2
‖BL−1xe‖22≈ τ2eτe
= τe
So, as long as τe > O(1/ log n), we will recover the edge!
![Page 106: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/106.jpg)
Sketching a Sparsifier
Putting it all together: BL−1xe
1. Sketch (Πheavy hitters)B in n logc n space.
2. Compute (Πheavy hitters)BL−1.
3. For every possible edge e, compute (Πheavy hitters)BL−1xe
4. Extract heavy hitters from the vector, check if eth entry is one.
BL−1xe(e)2
‖BL−1xe‖22≈ τ2eτe
= τe
So, as long as τe > O(1/ log n), we will recover the edge!
![Page 107: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/107.jpg)
Sketching a Sparsifier
Putting it all together: BL−1xe
1. Sketch (Πheavy hitters)B in n logc n space.
2. Compute (Πheavy hitters)BL−1.
3. For every possible edge e, compute (Πheavy hitters)BL−1xe
4. Extract heavy hitters from the vector, check if eth entry is one.
BL−1xe(e)2
‖BL−1xe‖22≈ τ2eτe
= τe
So, as long as τe > O(1/ log n), we will recover the edge!
![Page 108: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/108.jpg)
Sketching a Sparsifier
Putting it all together: BL−1xe
1. Sketch (Πheavy hitters)B in n logc n space.
2. Compute (Πheavy hitters)BL−1.
3. For every possible edge e, compute (Πheavy hitters)BL−1xe
4. Extract heavy hitters from the vector, check if eth entry is one.
BL−1xe(e)2
‖BL−1xe‖22≈ τ2eτe
= τe
So, as long as τe > O(1/ log n), we will recover the edge!
![Page 109: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/109.jpg)
Sketching a Sparsifier
How about edges with lower effective resistance? Sketch:
BL−1xe
![Page 110: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/110.jpg)
Sketching a Sparsifier
How about edges with lower effective resistance? Sketch:
BL−1xe
![Page 111: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/111.jpg)
Sketching a Sparsifier
How about edges with lower effective resistance? Sketch:
BL−1xe
![Page 112: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/112.jpg)
Sketching a Sparsifier
How about edges with lower effective resistance? Sketch:
BL−1xe
![Page 113: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/113.jpg)
Sketching a Sparsifier
How about edges with lower effective resistance? Sketch:
BL−1xe
![Page 114: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/114.jpg)
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
![Page 115: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/115.jpg)
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
![Page 116: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/116.jpg)
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
![Page 117: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/117.jpg)
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
![Page 118: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/118.jpg)
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
![Page 119: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/119.jpg)
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
![Page 120: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/120.jpg)
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
![Page 121: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/121.jpg)
Sparsifer Chain
Final Piece [Li, Miller, Peng ’12]
I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.
![Page 122: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/122.jpg)
Sparsifer Chain
Final Piece [Li, Miller, Peng ’12]
I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.
![Page 123: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/123.jpg)
Sparsifer Chain
Final Piece [Li, Miller, Peng ’12]
I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.
![Page 124: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/124.jpg)
Sparsifer Chain
Final Piece [Li, Miller, Peng ’12]
I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.
![Page 125: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension](https://reader033.fdocuments.us/reader033/viewer/2022060521/604f8f294b1cfd22221aaf34/html5/thumbnails/125.jpg)
Sparsifer Chain
Final Piece [Li, Miller, Peng ’12]
I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.