Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar...

50
Consistency of Spectral Algorithms for Hypergraphs under Planted Partition Model Debarghya Ghoshdastidar Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 1 / 47

Transcript of Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar...

Page 1: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Consistency of Spectral Algorithms for Hypergraphsunder Planted Partition Model

Debarghya Ghoshdastidar

Ph.D. Thesis Defense

Advisor: Prof. Ambedkar Dukkipati

January 2, 2017

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 1 / 47

Page 2: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Overview

Purpose of the work:

Theoretical study of spectral methods for hypergraph partitioning

Contributions:

Model for random hypergraphs with planted partition

Error bounds for partitioning planted hypergraphs

New algorithms with improved error rates

Analysis of edge sampling strategies

Bi-partite hypergraph coloring

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 2 / 47

Page 3: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Spectral Algorithm for Graph PartitioningSpectral Clustering

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 3 / 47

Page 4: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Graph Partitioning

Objective:

High connectivity within clusters

Few edges across clusters (small cut)

Balanced partitions

Applications:

Network Data Imagepartitioning clustering segmentation

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 4 / 47

Page 5: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Graph Partitioning

Objective:

High connectivity within clusters

Few edges across clusters (small cut)

Balanced partitions

Applications:

Network Data Imagepartitioning clustering segmentation

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 4 / 47

Page 6: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Graph Partitioning

Objective:

High connectivity within clusters

Few edges across clusters (small cut)

Balanced partitions

Applications:

Network Data Imagepartitioning clustering segmentation

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 4 / 47

Page 7: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Spectral Graph Partitioning / Spectral Clustering

Input Graph Good balanced cut

(Normalized) Find k dominant Run k-meansAdjacency matrix eigenvectors on rows

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 5 / 47

Page 8: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Spectral Clustering (in practice)

Input Graph Good balanced cut

(Normalized) Find k dominant Run k-meansAdjacency matrix eigenvectors on rows

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 6 / 47

Page 9: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Theoretical analysis

Stochastic block model: [Holland, Laskey & Leinhardt '83]

Random hypergraph (V, E) on |V| = n nodes

Nodes have (hidden) class labels, ψ : 1, . . . , n → 1, . . . , kP(euv ∈ E) depends on labels of u, v

Question:

Error(ψ,ψ′) = minσ

n∑i=1

1ψi 6= σ(ψ′i) (ψ′ is output label)

Find βn such that

Error(ψ,ψ′) ≤ βn with probability 1− o(1)

Consistency of algorithms:

Weakly consistent if βn = o(n); Strongly consistent if βn = o(1)

Spectral clustering is weakly consistent [Rohe, Chatterjee & Yu '11]

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 7 / 47

Page 10: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Hypergraph PartitioningApplications and Algorithms

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 8 / 47

Page 11: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Hypergraphs

Collection of sets / Generalization of graphs

Each edge can connect more than two nodes

Graph 3-uniform Hypergraph(2-uniform) hypergraph

m-uniform hypergraph:

Each edge connects m nodes

Adjacencies can be represented by mth-order tensor

Ai1i2...im =

1 if there is edge on i1, i2, . . . , im0 otherwise

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 9 / 47

Page 12: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Hypergraphs in Databases [Gibson, Kleinberg & Raghavan '00]

Gender Male Female Male Male FemaleHair Red Black Bald Black Red

Glasses Yes No Yes No No

Edges can be of varying sizes (non-uniform hypergraph)

Male, Black hair, Without glasses, and so on . . .

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 10 / 47

Page 13: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Hypergraphs in Computer Vision [Agarwal et al. '05]

Subspace clustering Motion segmentation

Matching / Image Registration

Involves 3-way / 4-way similarities (uniform hypergraph)

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 11 / 47

Page 14: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Hypergraph Partitioning Methods

Partitioning circuits [Schweikert & Kernighan '79]

Graph approximation for hypergraphs [Hadley '95]

Spectral hypergraph partitioning [Zien et al. '99]

hMETIS for VLSI design [Karypis & Kumar '00]

Uniform hypergraph in databases [Gibson et al. '00]

Uniform hypergraph in vision [Agarwal et al. '05]

Tensor based algorithms [Govindu '05; Chen & Lerman '09]

Learning with non-uniform hypergraph [Zhou et al. '07]

Higher order learning [Duchenne et al. '11; Rota Bulo & Pellilo '13; etc.]

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 12 / 47

Page 15: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Algorithms studied in our work

HOSVD / SCC: [Govindu '05; Chen & Lerman '09]

Uniform hypergraph partitioning using higher order SVD of adjacencytensor.

TTM / TeTrIS: (proposed)

Uniform hypergraph partitioning by solving a tensor trace maximizationproblem.

TeTrIS is efficient (sampled) version of TTM.

NH-Cut: [Zhou, Huang & Scholkopf '07]

Non-uniform hypergraph partitioning by minimizing normalizedhypergraph cut.

COLOR: (proposed)

Vertex 2-coloring of bi-partite non-uniform hypergraph.

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 13 / 47

Page 16: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Uniform Hypergraph PartitioningSpectral Algorithms

Approach 1: Higher order SVD of adjacency tensor

Approach 2: Associativity or tensor trace maximization

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 14 / 47

Page 17: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Approach 1: Higher Order SVD

Matrix eigen decomposition:

A U Σ UT .(orthonormal) (diagonal)

HOSVD of 3rd-order tensor: [De Lathauwer et al. '00]

A U Σ UT .

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 15 / 47

Page 18: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

HOSVD based Partitioning [Govindu '05]

m-uniform hypergraph

Adjacency tensor A Flattened matrix A

Find dominant left Run k-meanssingular vectors on rows

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 16 / 47

Page 19: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

HOSVD based Partitioning [Govindu '05]

m-uniform hypergraph

Adjacency tensor A Flattened matrix A

Find dominant left Run k-meanssingular vectors on rows

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 16 / 47

Page 20: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Approach 2: Associativity Maximization

Normalized associativity:

For any cluster V1 ⊂ Vassociativity(V1) =

∑e⊂V1

w(e)

volume(V1) =∑

v∈V1degree(v)

Normalized associativity of partition

N-assoc(V1, . . . ,Vk) =

k∑j=1

associativity(Vj)volume(Vj)

Problem:

Find partition that maximizes N-assoc(V1, . . . ,Vk)

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 17 / 47

Page 21: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Tensor Trace Maximization (TTM)

Problem (reformulated):

For m-uniform hypergraph

N-assoc(V1, . . . ,Vk) = 1m! Trace

(A×1 Y

b1 ×2 . . .×m Y bm)

Y ∈ Rn×k has orthogonal columns, and∑

j bj = 1

Y b1 A Y b2 .

Spectral relaxation of TTM:

Set b1 = b2 = 12 , b3 = . . . = bm = 0 and X = Y 1/2

Optimize over all orthonormal X

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 18 / 47

Page 22: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Spectral TTM Algorithm [Ghoshdastidar & Dukkipati, ICML'15]

m-uniform hypergraph

Matrix A

Adjacency tensor A Add slices of tensor

Run k-means Find k dominanton rows eigenvectors

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 19 / 47

Page 23: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Uniform Hypergraph PartitioningConsistency

Planted partition model for uniform hypergraphs

Error bounds for algorithms

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 20 / 47

Page 24: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Planted Partition Model (graph)

Sparse Stochastic Block Model: [Lei & Rinaldo '15]

Given n nodes, and k (hidden) classes

An unknown matrix B ∈ [0, 1]k×k symmetric

An unknown sparsity factor αn

Independent edges with probabilities depending on labels

• • •Class-1 Class-2 Class-3

Prob(•,•) = αnB11, Prob(•,•) = αnB12, Prob(•,•) = αnB13 . . .

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 21 / 47

Page 25: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Planted Partition Model (uniform hypergraph)

Extension of Sparse SBM: (proposed)

Given n nodes, and k (hidden) classes

Unknown mth-order tensor B ∈ [0, 1]k×k×...×k

Unknown sparsity factor αn

Independent edges with label-dependent distribution

Unweighted hypergraph:Prob(edge) = αnBi1i2...im

Weighted hypergraph:w(edge) ∈ [0, 1]E[w(edge)] = αnBi1i2...im

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 22 / 47

Page 26: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Consistency of HOSVD [Ghoshdastidar & Dukkipati, NIPS, 2014]

Define:

nmax (or nmin) = max. (min.) cluster size

A = E[AAT

]and Amin = min

i,jAij : Aij > 0

δ = kth eigen-gap of normalized A

Theorem

There exists constant C > 0, such that, if

δ > 0 and Amin > Cknmax(log n)2

nminδ2

then with probability (1− o(1))

Error(ψ,ψ′) = O

(knmax log n

δ2Amin

)= o(n).

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 23 / 47

Page 27: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Consistency of TTM [Ghoshdastidar & Dukkipati, ICML, 2015]

Define:

d = mini

E[degree(i)] = mini

∑e3i

E[w(e)]

δ = kth eigen-gap of normalized E[A]

Theorem

There exists constant C > 0, such that, if

δ > 0 and d > Cknmax(log n)2

nminδ2

then with probability (1− o(1))

Error(ψ,ψ′) = O

(knmax log n

δ2d

)= o(n).

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 24 / 47

Page 28: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Special Case

m-uniform hypergraph

k = O(log n) clusters of equal size

Edge probabilities

Prob(edge) =

αnp if edge lies within a clusterαnq otherwise (p > q)

HOSVD TTM

Allowable sparsity: αn = Ω((logn)m+1.5

n(m−1)/2

)αn = Ω

((logn)2m+1

nm−1

)Dense hypergraph:(αn = 1)

error = O((logn)2m+1

nm−2

)error = O

((logn)2m−1

nm−2

)

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 25 / 47

Page 29: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Non-uniform Hypergraph PartitioningAlgorithm and Consistency

Approach 3: Normalized hypergraph cut minimization

Planted partition model for non-uniform hypergraphs

Consistency result (with proof sketch)

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 26 / 47

Page 30: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Normalized Hypergraph Cut

Approach: [Zhou, Huang & Scholkopf '07]

Solve spectral relaxation of minimizing normalized hypergraph cut

Reduction to graph:

A,D ∈ Rn×n so that Aij =∑e3i,j

1|e| , Dii = degree(i)

Spectral clustering:

Normalized Laplacian, L = I −D−1/2AD−1/2

Compute k leading orthonormal eigenvectors of L

k-means on normalized rows of eigenvector matrix

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 27 / 47

Page 31: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Planted Partition Model (non-uniform hypergraph)

Model: (proposed)

Given n nodes, and k (hidden) classes

Maximum edge cardinality M

Unknown mth-order tensors B(m) ∈ [0, 1]k×k×...×k

Unknown sparsity factors αm,n, m = 2, 3, . . . ,M

Independent edges with label-dependent distribution

Prob(m-edge) = αm,nB(m)i1i2...im

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 28 / 47

Page 32: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Consistency of NH-Cut [Ghoshdastidar & Dukkipati, Ann. Stat., 2017]

Define:

A = E[A], D = E[D] and L = I −D−1/2AD−1/2

d = mini E[degree(i)]

δ = kth eigen-gap of L

Theorem

There exists constant C > 0, such that, if

δ > 0 and d > Cknmax(log n)2

nminδ2

then with probability (1− o(1))

Error(ψ,ψ′) = O

(knmax log n

δ2d

)= o(n).

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 29 / 47

Page 33: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Proof of consistency

Stage 1: (expected case)

If δ > 0, then A is essentially of rank k

If A used instead of A, then Error = 0

Stage 2: (matrix concentration)

A can be expressed as a sum of random matrices

A =∑e∈2V

1e ∈ E(

1|e|heh

Te

)If d > 9 log n for all large n, then w.p. (1− 4

n2 ),

‖L− L‖2 ≤ 12

√log n

dProof uses matrix concentration inequality [Tropp '12]

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 30 / 47

Page 34: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Proof of consistency

Stage 3: (matrix perturbation)

X,X row normalized eigenvector matrices of L,L

If δ > 24√

lognd for all large n, then w.p. (1− 4

n2 )

‖X −X‖F ≤24

δ

√2knmax log n

dProof using matrix perturbation [Davis & Kahan '70]

Stage 4: (analyzing k-means)

Rows of X are ε-separable for ε = (log n)−1/2

k-means succeeds w.p. (1− o(1))

Error = O(‖X −X‖2F )

Based on guarantees of k-means [Ostrovsky et al. '12]

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 31 / 47

Page 35: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Sampling Hypergraph Edges

Consistency of partitioning with edge sampling

Approach 4: TTM with iterative sampling

Numerical comparison

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 32 / 47

Page 36: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Edge Sampling (weighted m-uniform hypergraph)

Complexity of tensor methods:

O(nm) runtime to compute all edge weights

Typically m = 3 to 8 in practice

Efficient variant: Use only N nm sampled edges

Question:

Edges sampled with replacement

Sampling distribution (pe)e∈E

Find min. number of samples needed for consistency

Sampling bound for TTM: [Ghoshdastidar & Dukkipati, arXiv:1602.06516]

(Special case) Error = o(n) if

Uniform sampling: N = Ω(α−1n k2m−1n(log n)2

)Weighted, pe ∝ w(e): N = Ω

(k2m−1n(log n)2

)Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 33 / 47

Page 37: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

TTM with Iterative Sampling (TeTrIS)

Iterative Sampling:

Principle:

Sample edges with large weight more frequentlyEdges within cluster usually have large weight

Approach (SCC): [Chen & Lerman '09]

Sample few edgesCluster using HOSVD based methodRe-sample with preference to within cluster edgesRe-cluster and repeat till convergence

TeTrIS Algorithm: [proposed]

Replace HOSVD step by TTM

Sampling bound for TTM justifies the usefulness of sampling largeweight edges via iterative sampling

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 34 / 47

Page 38: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Numerical Comparison

Motion Segmentation:

Cluster motion trajectories

Posed as subspace clustering problem

Each motion – subspace of dimension ≤ 4

Mean clustering error on Hopkins 155 data set (%)

Method 2 motion 3 motion All(120 videos) (35 videos)

k-means 19.57 26.16 21.06k-flats 13.05 15.78 13.67SSC 1.53 4.40 2.18LRR 2.13 4.03 2.56NSN 3.62 8.28 4.67

SCC (HOSVD) 2.38 5.71 3.13TeTrIS (TTM) 1.36 5.38 2.27

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 35 / 47

Page 39: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Hypergraph vertex 2-coloring

Objective: No edge can be mono-chromatic

Assume: Planted bi-partite hypergraphM = O(1) and E[#edges] ≥ Cn log n

Algorithm: [Ghoshdastidar & Dukkipati, arXiv:1507.00763]

Spectral step:

Let A ∈ Rn×n as Aij =∑

e3i,j

1|e|

Compute eigenvector x for smallest eigenvalue of AColor node-i red if sign(xi) > 0, else blue

⇒ Achieves error < cn for c 1

Iterative refinement:

Re-color node-i red if∑

j∈VRAij <

∑j∈VB

Aij , else blue

⇒ Error reduces by half in each iteration (log2 n steps suffice)

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 36 / 47

Page 40: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Summary

Hypergraph partitioning can be done efficiently

First study of planted non-uniform hypergraphs

Literature considers only planted k-SAT / 2-coloring

Statistical analysis of tensor based methods

Popular in practice, but no known error bound

Removing the assumptions on k-means

First study of sampled spectral algorithms

Justification for iterative sampling

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 37 / 47

Page 41: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Further works & open questions

Extension to large scale hypergraph partitioningDown sampling of hypergraphs

Analysis of other approaches under planted modelMove based strategiesOptimization based algorithms

Study of sparse planted hypergraphsOverlapping communities / Degree heterogenityAlgorithmic barrier for partitioning

[Angelini et al. '15; Florescu & Perkins '16]

Generalization of graphs problems to hypergraphsTheoretical studiesApplications

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 38 / 47

Page 42: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Thank You

Acknowledgment:The work was supported by Google Ph.D. Fellowship in Statistical Learning Theory

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 39 / 47

Page 43: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

References

Agarwal, S., Lim, J., Zelnik-Manor, L., Perona, P., Kriegman, D. & Belongie,S. (2005). In IEEE Computer Vision and Pattern Recognition 838-845.

Angelini, M. C., Caltagirone, F., Krzakala, F. and Zdeborova, L. (2015). InAnnual Allerton Conference on Communication, Control, and Computing.

Chen, G. & Lerman, G. (2009). International Journal of Computer Vision81(3) 317-330.

Davis, C. & Kahan, W. M. (1970). SIAM Journal on Numerical Analysis 7(1)1-46.

De Lathauwer, L., De Moor, B. and Vandewalle, J. (2000). SIAM Journal onMatrix Analysis and Applications 21(4) 1253-1278.

Duchenne, O., Bach, F., Kweon, I.-S. & Ponce, J. (2011). IEEE Transactionson Pattern Analysis and Machine Intelligence 33(12) 2383-2395.

Florescu, L. & Perkins, W. (2016). In Conference on Learning Theory.

Ghoshdastidar, D. & Dukkipati, A. (2014). In Advances in Neural InformalProcessing Systems 397-405.

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 40 / 47

Page 44: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

References

Ghoshdastidar, D. & Dukkipati, A. (2015). In International Conference onMachine Learning.

Ghoshdastidar, D. & Dukkipati, A. (2015). Annals of Statistics (in press).

Ghoshdastidar, D. & Dukkipati, A. (2015). arXiv preprint 1507.00763.

Ghoshdastidar, D. & Dukkipati, A. (2016). arXiv preprint 1602.06516.

Gibson, D., Kleinberg, J. & Raghavan, P. (2000). VLDB Journal 8 222-236.

Govindu, V. M. (2005). In IEEE Computer Vision and Pattern Recognition1150-1157.

Hadley, S. W. (1995). Discrete Applied Mathematics 59 115-127.

Hein, M., Setzer, S., Jost, L. and Rangapuram, S. (2013). In Advances inNeural Informal Processing Systems 2427-2435.

Holland, P. W., Laskey, K. B. & Leinhardt, S. (1983). Social Networks 5109-137.

Karypis, G. & Kumar, V. (2000). VLSI Design 11 285-300.

Lei, J. & Rinaldo, A. (2015). Annals of Statistics 43 215-237.

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 41 / 47

Page 45: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

References

Ostrovsky, R., Rabani, Y., Schulman, L. J. & Swamy, C. (2012). Journal of theACM 59(6) 28:128.

Rohe, K., Chatterjee, S., & Yu, B. (2011). Annals of Statistics 39 1878-1915.

Rota Bulo, S. & Pellilo, M. (2013). IEEE Transactions on Pattern Analysisand Machine Intelligence 35(6) 1312-1327.

Schweikert, G. & Kernighan, B. W. (1979). In Design Automation Workshop57-62.

Tropp, J. A. (2012). Foundations of Computational Mathematics 12(4)389-434.

Zien, J. Y., Schlag, M. D. F. and Chan, P. K. (1999). IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems 13(9) 1088-1096.

Zhou, D., Huang, J. and Scholkopf, B. (2007). In Advances in Neural InformalProcessing Systems 1601-1608.

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 42 / 47

Page 46: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Consistency of Sampled TTM (General Case)

Define:

γ = maxew(e)p(e) , where p(e) = P(e is sampled)

d = mini

E[degree(i)], δ = kth eigen-gap of normalized E[A]

Theorem [Ghoshdastidar & Dukkipati '16]

There exist constants C,C ′ > 0, such that, if

δ > 0, d > Cknmax(log n)2

nminδ2

and N > C ′(

1 +2γ

d

)knmax(log n)2

nminδ2

then with probability (1− o(1))

Error(ψ,ψ′) = o(n).

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 43 / 47

Page 47: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

More Numerical Results

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 44 / 47

Page 48: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Numerical Comparison (uniform hypergraph)

Subspace clustering

60 points in 5-dim ambient space

Data from union of three random lines (1-dim subspaces)

Data perturbed by Gaussian noise of standard deviation σa

Fractional error (over 20 runs)

Algorithm Noise levelσa = 0.02 σa = 0.05

SNTF 0.025 0.086hMETIS 0.045 0.118

HGT 0.083 0.222HOSVD 0.052 0.126

TTM 0.033 0.103

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 45 / 47

Page 49: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Numerical Comparison (sampled uniform hypergraph)

Subspace clustering

5-dim ambient spaceData from union of five 3-dim subspaces (added noise)

Nois

ele

vel

,σa

Fra

ctio

nal

erro

r(o

ver

50

runs)

Number of points in each subspace, n/k

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 46 / 47

Page 50: Consistency of Spectral Algorithms for Hypergraphs …Ph.D. Thesis Defense Advisor: Prof. Ambedkar Dukkipati January 2, 2017 Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017

Numerical Comparison (non-uniform hypergraph)

Categorical data clustering

Data set #instances #attributes #attr. values

Voting records 435 16 3Mushroom 8124 22 varies

Fractional errorData set ROCK CoolCat LIMBO hMETIS NH-Cut

Voting 0.16 0.15 0.13 0.24 0.12Mushroom 0.43 0.27 0.11 0.48 0.11

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 47 / 47