Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar...
Transcript of Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar...
Consistency of Spectral Algorithms for Hypergraphsunder Planted Partition Model
Debarghya Ghoshdastidar
Ph.D. Thesis Defense
Advisor: Prof. Ambedkar Dukkipati
January 2, 2017
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 1 / 47
Overview
Purpose of the work:
Theoretical study of spectral methods for hypergraph partitioning
Contributions:
Model for random hypergraphs with planted partition
Error bounds for partitioning planted hypergraphs
New algorithms with improved error rates
Analysis of edge sampling strategies
Bi-partite hypergraph coloring
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 2 / 47
Spectral Algorithm for Graph PartitioningSpectral Clustering
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 3 / 47
Graph Partitioning
Objective:
High connectivity within clusters
Few edges across clusters (small cut)
Balanced partitions
Applications:
Network Data Imagepartitioning clustering segmentation
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 4 / 47
Graph Partitioning
Objective:
High connectivity within clusters
Few edges across clusters (small cut)
Balanced partitions
Applications:
Network Data Imagepartitioning clustering segmentation
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 4 / 47
Graph Partitioning
Objective:
High connectivity within clusters
Few edges across clusters (small cut)
Balanced partitions
Applications:
Network Data Imagepartitioning clustering segmentation
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 4 / 47
Spectral Graph Partitioning / Spectral Clustering
Input Graph Good balanced cut
(Normalized) Find k dominant Run k-meansAdjacency matrix eigenvectors on rows
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 5 / 47
Spectral Clustering (in practice)
Input Graph Good balanced cut
(Normalized) Find k dominant Run k-meansAdjacency matrix eigenvectors on rows
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 6 / 47
Theoretical analysis
Stochastic block model: [Holland, Laskey & Leinhardt '83]
Random hypergraph (V, E) on |V| = n nodes
Nodes have (hidden) class labels, ψ : 1, . . . , n → 1, . . . , kP(euv ∈ E) depends on labels of u, v
Question:
Error(ψ,ψ′) = minσ
n∑i=1
1ψi 6= σ(ψ′i) (ψ′ is output label)
Find βn such that
Error(ψ,ψ′) ≤ βn with probability 1− o(1)
Consistency of algorithms:
Weakly consistent if βn = o(n); Strongly consistent if βn = o(1)
Spectral clustering is weakly consistent [Rohe, Chatterjee & Yu '11]
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 7 / 47
Hypergraph PartitioningApplications and Algorithms
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 8 / 47
Hypergraphs
Collection of sets / Generalization of graphs
Each edge can connect more than two nodes
Graph 3-uniform Hypergraph(2-uniform) hypergraph
m-uniform hypergraph:
Each edge connects m nodes
Adjacencies can be represented by mth-order tensor
Ai1i2...im =
1 if there is edge on i1, i2, . . . , im0 otherwise
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 9 / 47
Hypergraphs in Databases [Gibson, Kleinberg & Raghavan '00]
Gender Male Female Male Male FemaleHair Red Black Bald Black Red
Glasses Yes No Yes No No
Edges can be of varying sizes (non-uniform hypergraph)
Male, Black hair, Without glasses, and so on . . .
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 10 / 47
Hypergraphs in Computer Vision [Agarwal et al. '05]
Subspace clustering Motion segmentation
Matching / Image Registration
Involves 3-way / 4-way similarities (uniform hypergraph)
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 11 / 47
Hypergraph Partitioning Methods
Partitioning circuits [Schweikert & Kernighan '79]
Graph approximation for hypergraphs [Hadley '95]
Spectral hypergraph partitioning [Zien et al. '99]
hMETIS for VLSI design [Karypis & Kumar '00]
Uniform hypergraph in databases [Gibson et al. '00]
Uniform hypergraph in vision [Agarwal et al. '05]
Tensor based algorithms [Govindu '05; Chen & Lerman '09]
Learning with non-uniform hypergraph [Zhou et al. '07]
Higher order learning [Duchenne et al. '11; Rota Bulo & Pellilo '13; etc.]
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 12 / 47
Algorithms studied in our work
HOSVD / SCC: [Govindu '05; Chen & Lerman '09]
Uniform hypergraph partitioning using higher order SVD of adjacencytensor.
TTM / TeTrIS: (proposed)
Uniform hypergraph partitioning by solving a tensor trace maximizationproblem.
TeTrIS is efficient (sampled) version of TTM.
NH-Cut: [Zhou, Huang & Scholkopf '07]
Non-uniform hypergraph partitioning by minimizing normalizedhypergraph cut.
COLOR: (proposed)
Vertex 2-coloring of bi-partite non-uniform hypergraph.
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 13 / 47
Uniform Hypergraph PartitioningSpectral Algorithms
Approach 1: Higher order SVD of adjacency tensor
Approach 2: Associativity or tensor trace maximization
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 14 / 47
Approach 1: Higher Order SVD
Matrix eigen decomposition:
A U Σ UT .(orthonormal) (diagonal)
HOSVD of 3rd-order tensor: [De Lathauwer et al. '00]
A U Σ UT .
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 15 / 47
HOSVD based Partitioning [Govindu '05]
m-uniform hypergraph
Adjacency tensor A Flattened matrix A
Find dominant left Run k-meanssingular vectors on rows
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 16 / 47
HOSVD based Partitioning [Govindu '05]
m-uniform hypergraph
Adjacency tensor A Flattened matrix A
Find dominant left Run k-meanssingular vectors on rows
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 16 / 47
Approach 2: Associativity Maximization
Normalized associativity:
For any cluster V1 ⊂ Vassociativity(V1) =
∑e⊂V1
w(e)
volume(V1) =∑
v∈V1degree(v)
Normalized associativity of partition
N-assoc(V1, . . . ,Vk) =
k∑j=1
associativity(Vj)volume(Vj)
Problem:
Find partition that maximizes N-assoc(V1, . . . ,Vk)
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 17 / 47
Tensor Trace Maximization (TTM)
Problem (reformulated):
For m-uniform hypergraph
N-assoc(V1, . . . ,Vk) = 1m! Trace
(A×1 Y
b1 ×2 . . .×m Y bm)
Y ∈ Rn×k has orthogonal columns, and∑
j bj = 1
Y b1 A Y b2 .
Spectral relaxation of TTM:
Set b1 = b2 = 12 , b3 = . . . = bm = 0 and X = Y 1/2
Optimize over all orthonormal X
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 18 / 47
Spectral TTM Algorithm [Ghoshdastidar & Dukkipati, ICML'15]
m-uniform hypergraph
Matrix A
Adjacency tensor A Add slices of tensor
Run k-means Find k dominanton rows eigenvectors
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 19 / 47
Uniform Hypergraph PartitioningConsistency
Planted partition model for uniform hypergraphs
Error bounds for algorithms
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 20 / 47
Planted Partition Model (graph)
Sparse Stochastic Block Model: [Lei & Rinaldo '15]
Given n nodes, and k (hidden) classes
An unknown matrix B ∈ [0, 1]k×k symmetric
An unknown sparsity factor αn
Independent edges with probabilities depending on labels
• • •Class-1 Class-2 Class-3
Prob(•,•) = αnB11, Prob(•,•) = αnB12, Prob(•,•) = αnB13 . . .
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 21 / 47
Planted Partition Model (uniform hypergraph)
Extension of Sparse SBM: (proposed)
Given n nodes, and k (hidden) classes
Unknown mth-order tensor B ∈ [0, 1]k×k×...×k
Unknown sparsity factor αn
Independent edges with label-dependent distribution
Unweighted hypergraph:Prob(edge) = αnBi1i2...im
Weighted hypergraph:w(edge) ∈ [0, 1]E[w(edge)] = αnBi1i2...im
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 22 / 47
Consistency of HOSVD [Ghoshdastidar & Dukkipati, NIPS, 2014]
Define:
nmax (or nmin) = max. (min.) cluster size
A = E[AAT
]and Amin = min
i,jAij : Aij > 0
δ = kth eigen-gap of normalized A
Theorem
There exists constant C > 0, such that, if
δ > 0 and Amin > Cknmax(log n)2
nminδ2
then with probability (1− o(1))
Error(ψ,ψ′) = O
(knmax log n
δ2Amin
)= o(n).
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 23 / 47
Consistency of TTM [Ghoshdastidar & Dukkipati, ICML, 2015]
Define:
d = mini
E[degree(i)] = mini
∑e3i
E[w(e)]
δ = kth eigen-gap of normalized E[A]
Theorem
There exists constant C > 0, such that, if
δ > 0 and d > Cknmax(log n)2
nminδ2
then with probability (1− o(1))
Error(ψ,ψ′) = O
(knmax log n
δ2d
)= o(n).
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 24 / 47
Special Case
m-uniform hypergraph
k = O(log n) clusters of equal size
Edge probabilities
Prob(edge) =
αnp if edge lies within a clusterαnq otherwise (p > q)
HOSVD TTM
Allowable sparsity: αn = Ω((logn)m+1.5
n(m−1)/2
)αn = Ω
((logn)2m+1
nm−1
)Dense hypergraph:(αn = 1)
error = O((logn)2m+1
nm−2
)error = O
((logn)2m−1
nm−2
)
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 25 / 47
Non-uniform Hypergraph PartitioningAlgorithm and Consistency
Approach 3: Normalized hypergraph cut minimization
Planted partition model for non-uniform hypergraphs
Consistency result (with proof sketch)
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 26 / 47
Normalized Hypergraph Cut
Approach: [Zhou, Huang & Scholkopf '07]
Solve spectral relaxation of minimizing normalized hypergraph cut
Reduction to graph:
A,D ∈ Rn×n so that Aij =∑e3i,j
1|e| , Dii = degree(i)
Spectral clustering:
Normalized Laplacian, L = I −D−1/2AD−1/2
Compute k leading orthonormal eigenvectors of L
k-means on normalized rows of eigenvector matrix
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 27 / 47
Planted Partition Model (non-uniform hypergraph)
Model: (proposed)
Given n nodes, and k (hidden) classes
Maximum edge cardinality M
Unknown mth-order tensors B(m) ∈ [0, 1]k×k×...×k
Unknown sparsity factors αm,n, m = 2, 3, . . . ,M
Independent edges with label-dependent distribution
Prob(m-edge) = αm,nB(m)i1i2...im
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 28 / 47
Consistency of NH-Cut [Ghoshdastidar & Dukkipati, Ann. Stat., 2017]
Define:
A = E[A], D = E[D] and L = I −D−1/2AD−1/2
d = mini E[degree(i)]
δ = kth eigen-gap of L
Theorem
There exists constant C > 0, such that, if
δ > 0 and d > Cknmax(log n)2
nminδ2
then with probability (1− o(1))
Error(ψ,ψ′) = O
(knmax log n
δ2d
)= o(n).
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 29 / 47
Proof of consistency
Stage 1: (expected case)
If δ > 0, then A is essentially of rank k
If A used instead of A, then Error = 0
Stage 2: (matrix concentration)
A can be expressed as a sum of random matrices
A =∑e∈2V
1e ∈ E(
1|e|heh
Te
)If d > 9 log n for all large n, then w.p. (1− 4
n2 ),
‖L− L‖2 ≤ 12
√log n
dProof uses matrix concentration inequality [Tropp '12]
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 30 / 47
Proof of consistency
Stage 3: (matrix perturbation)
X,X row normalized eigenvector matrices of L,L
If δ > 24√
lognd for all large n, then w.p. (1− 4
n2 )
‖X −X‖F ≤24
δ
√2knmax log n
dProof using matrix perturbation [Davis & Kahan '70]
Stage 4: (analyzing k-means)
Rows of X are ε-separable for ε = (log n)−1/2
k-means succeeds w.p. (1− o(1))
Error = O(‖X −X‖2F )
Based on guarantees of k-means [Ostrovsky et al. '12]
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 31 / 47
Sampling Hypergraph Edges
Consistency of partitioning with edge sampling
Approach 4: TTM with iterative sampling
Numerical comparison
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 32 / 47
Edge Sampling (weighted m-uniform hypergraph)
Complexity of tensor methods:
O(nm) runtime to compute all edge weights
Typically m = 3 to 8 in practice
Efficient variant: Use only N nm sampled edges
Question:
Edges sampled with replacement
Sampling distribution (pe)e∈E
Find min. number of samples needed for consistency
Sampling bound for TTM: [Ghoshdastidar & Dukkipati, arXiv:1602.06516]
(Special case) Error = o(n) if
Uniform sampling: N = Ω(α−1n k2m−1n(log n)2
)Weighted, pe ∝ w(e): N = Ω
(k2m−1n(log n)2
)Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 33 / 47
TTM with Iterative Sampling (TeTrIS)
Iterative Sampling:
Principle:
Sample edges with large weight more frequentlyEdges within cluster usually have large weight
Approach (SCC): [Chen & Lerman '09]
Sample few edgesCluster using HOSVD based methodRe-sample with preference to within cluster edgesRe-cluster and repeat till convergence
TeTrIS Algorithm: [proposed]
Replace HOSVD step by TTM
Sampling bound for TTM justifies the usefulness of sampling largeweight edges via iterative sampling
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 34 / 47
Numerical Comparison
Motion Segmentation:
Cluster motion trajectories
Posed as subspace clustering problem
Each motion – subspace of dimension ≤ 4
Mean clustering error on Hopkins 155 data set (%)
Method 2 motion 3 motion All(120 videos) (35 videos)
k-means 19.57 26.16 21.06k-flats 13.05 15.78 13.67SSC 1.53 4.40 2.18LRR 2.13 4.03 2.56NSN 3.62 8.28 4.67
SCC (HOSVD) 2.38 5.71 3.13TeTrIS (TTM) 1.36 5.38 2.27
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 35 / 47
Hypergraph vertex 2-coloring
Objective: No edge can be mono-chromatic
Assume: Planted bi-partite hypergraphM = O(1) and E[#edges] ≥ Cn log n
Algorithm: [Ghoshdastidar & Dukkipati, arXiv:1507.00763]
Spectral step:
Let A ∈ Rn×n as Aij =∑
e3i,j
1|e|
Compute eigenvector x for smallest eigenvalue of AColor node-i red if sign(xi) > 0, else blue
⇒ Achieves error < cn for c 1
Iterative refinement:
Re-color node-i red if∑
j∈VRAij <
∑j∈VB
Aij , else blue
⇒ Error reduces by half in each iteration (log2 n steps suffice)
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 36 / 47
Summary
Hypergraph partitioning can be done efficiently
First study of planted non-uniform hypergraphs
Literature considers only planted k-SAT / 2-coloring
Statistical analysis of tensor based methods
Popular in practice, but no known error bound
Removing the assumptions on k-means
First study of sampled spectral algorithms
Justification for iterative sampling
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 37 / 47
Further works & open questions
Extension to large scale hypergraph partitioningDown sampling of hypergraphs
Analysis of other approaches under planted modelMove based strategiesOptimization based algorithms
Study of sparse planted hypergraphsOverlapping communities / Degree heterogenityAlgorithmic barrier for partitioning
[Angelini et al. '15; Florescu & Perkins '16]
Generalization of graphs problems to hypergraphsTheoretical studiesApplications
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 38 / 47
Thank You
Acknowledgment:The work was supported by Google Ph.D. Fellowship in Statistical Learning Theory
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 39 / 47
References
Agarwal, S., Lim, J., Zelnik-Manor, L., Perona, P., Kriegman, D. & Belongie,S. (2005). In IEEE Computer Vision and Pattern Recognition 838-845.
Angelini, M. C., Caltagirone, F., Krzakala, F. and Zdeborova, L. (2015). InAnnual Allerton Conference on Communication, Control, and Computing.
Chen, G. & Lerman, G. (2009). International Journal of Computer Vision81(3) 317-330.
Davis, C. & Kahan, W. M. (1970). SIAM Journal on Numerical Analysis 7(1)1-46.
De Lathauwer, L., De Moor, B. and Vandewalle, J. (2000). SIAM Journal onMatrix Analysis and Applications 21(4) 1253-1278.
Duchenne, O., Bach, F., Kweon, I.-S. & Ponce, J. (2011). IEEE Transactionson Pattern Analysis and Machine Intelligence 33(12) 2383-2395.
Florescu, L. & Perkins, W. (2016). In Conference on Learning Theory.
Ghoshdastidar, D. & Dukkipati, A. (2014). In Advances in Neural InformalProcessing Systems 397-405.
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 40 / 47
References
Ghoshdastidar, D. & Dukkipati, A. (2015). In International Conference onMachine Learning.
Ghoshdastidar, D. & Dukkipati, A. (2015). Annals of Statistics (in press).
Ghoshdastidar, D. & Dukkipati, A. (2015). arXiv preprint 1507.00763.
Ghoshdastidar, D. & Dukkipati, A. (2016). arXiv preprint 1602.06516.
Gibson, D., Kleinberg, J. & Raghavan, P. (2000). VLDB Journal 8 222-236.
Govindu, V. M. (2005). In IEEE Computer Vision and Pattern Recognition1150-1157.
Hadley, S. W. (1995). Discrete Applied Mathematics 59 115-127.
Hein, M., Setzer, S., Jost, L. and Rangapuram, S. (2013). In Advances inNeural Informal Processing Systems 2427-2435.
Holland, P. W., Laskey, K. B. & Leinhardt, S. (1983). Social Networks 5109-137.
Karypis, G. & Kumar, V. (2000). VLSI Design 11 285-300.
Lei, J. & Rinaldo, A. (2015). Annals of Statistics 43 215-237.
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 41 / 47
References
Ostrovsky, R., Rabani, Y., Schulman, L. J. & Swamy, C. (2012). Journal of theACM 59(6) 28:128.
Rohe, K., Chatterjee, S., & Yu, B. (2011). Annals of Statistics 39 1878-1915.
Rota Bulo, S. & Pellilo, M. (2013). IEEE Transactions on Pattern Analysisand Machine Intelligence 35(6) 1312-1327.
Schweikert, G. & Kernighan, B. W. (1979). In Design Automation Workshop57-62.
Tropp, J. A. (2012). Foundations of Computational Mathematics 12(4)389-434.
Zien, J. Y., Schlag, M. D. F. and Chan, P. K. (1999). IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems 13(9) 1088-1096.
Zhou, D., Huang, J. and Scholkopf, B. (2007). In Advances in Neural InformalProcessing Systems 1601-1608.
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 42 / 47
Consistency of Sampled TTM (General Case)
Define:
γ = maxew(e)p(e) , where p(e) = P(e is sampled)
d = mini
E[degree(i)], δ = kth eigen-gap of normalized E[A]
Theorem [Ghoshdastidar & Dukkipati '16]
There exist constants C,C ′ > 0, such that, if
δ > 0, d > Cknmax(log n)2
nminδ2
and N > C ′(
1 +2γ
d
)knmax(log n)2
nminδ2
then with probability (1− o(1))
Error(ψ,ψ′) = o(n).
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 43 / 47
More Numerical Results
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 44 / 47
Numerical Comparison (uniform hypergraph)
Subspace clustering
60 points in 5-dim ambient space
Data from union of three random lines (1-dim subspaces)
Data perturbed by Gaussian noise of standard deviation σa
Fractional error (over 20 runs)
Algorithm Noise levelσa = 0.02 σa = 0.05
SNTF 0.025 0.086hMETIS 0.045 0.118
HGT 0.083 0.222HOSVD 0.052 0.126
TTM 0.033 0.103
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 45 / 47
Numerical Comparison (sampled uniform hypergraph)
Subspace clustering
5-dim ambient spaceData from union of five 3-dim subspaces (added noise)
Nois
ele
vel
,σa
Fra
ctio
nal
erro
r(o
ver
50
runs)
Number of points in each subspace, n/k
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 46 / 47
Numerical Comparison (non-uniform hypergraph)
Categorical data clustering
Data set #instances #attributes #attr. values
Voting records 435 16 3Mushroom 8124 22 varies
Fractional errorData set ROCK CoolCat LIMBO hMETIS NH-Cut
Voting 0.16 0.15 0.13 0.24 0.12Mushroom 0.43 0.27 0.11 0.48 0.11
Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 47 / 47