Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs &...

56
Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices Semi-supervised Learning with Graphs & Matrices Future Research Directions Data Mining with Graphs and Matrices Fei Wang 1 Tao Li 1 Chris Ding 2 1 School of Computing and Information Sciences Florida International University 2 Department of Computer Science and Engineering University of Texas at Arlington Tutorial at SDM 2009, Sparks, Nevada http://feiwang03.googlepages.com/sdm-tutorial Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices Semi-supervised Learning with Graphs & Matrices Future Research Directions Table of Contents 1 Graphs and Matrices are Everywhere 2 Unsupervised Learning with Graphs & Matrices Dimensionality Reduction Clustering Co-Clustering 3 Semi-supervised Learning with Graphs & Matrices Semi-supervised Learning with Partially Labeled Data Semi-supervised Learning Using Side-Information 4 Future Research Directions Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Transcript of Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs &...

Page 1: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Data Mining with Graphs and Matrices

Fei Wang1 Tao Li1 Chris Ding2

1School of Computing and Information SciencesFlorida International University

2Department of Computer Science and EngineeringUniversity of Texas at Arlington

Tutorial at SDM 2009, Sparks, Nevada

http://feiwang03.googlepages.com/sdm-tutorial

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Table of Contents

1 Graphs and Matrices are Everywhere

2 Unsupervised Learning with Graphs & Matrices

Dimensionality Reduction

Clustering

Co-Clustering

3 Semi-supervised Learning with Graphs & Matrices

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

4 Future Research Directions

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 2: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Internet Graph

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Citation Graph

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 3: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Friendship Graph

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Airline Graph

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 4: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Protein Interaction Graph

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Social Network Analysis

Email network

Represents the email communications between users

Cluster usersIdentify communities

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 5: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Document-Term Matrices

A collection of documents is represented by annDoc-by-nTerm matrix (bag-of-words model).

Cluster or classify documents

Find a subset of terms that (accurately) clusters orclassifies documents

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Recommendation Systems

Collaborative filtering

Given the users’ historical data, predict the ratings of a

specific user to a new movie

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 6: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Bioinformatics

Gene expression data

Pick a subset of genes (if it exists) that suffices in order to

identify the “cancer type" of a patient

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Some Notations & Preliminaries

The data matrix X = [x1,x2, · · · ,xn] ! Rd!n

Generally, a graph G = "V, E# can be described as a matrix

The columns and rows are indexed by VThe elements are the strengths on the corresponding

edges in EAnalyzing graphs is usually equivalent to perform analysis

on matrices

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 7: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Table of Contents

1 Graphs and Matrices are Everywhere

2 Unsupervised Learning with Graphs & Matrices

Dimensionality Reduction

Clustering

Co-Clustering

3 Semi-supervised Learning with Graphs & Matrices

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

4 Future Research Directions

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Singular Value Decomposition

x1 x2 xn u1 uk s1

sk

vT1

vTk

Best rank-k approximation in Frobenius norm

Exact computation of SVD takes O(min(dn2,d2n)) time.

The top k left/right singular vectors/values can be

computed faster using Lanczos/Arnoldi methods.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 8: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Singular Value Decomposition

x1 x2 xn u1 uk s1

sk

vT1

vTk

Best rank-k approximation in Frobenius norm

Exact computation of SVD takes O(min(dn2,d2n)) time.

The top k left/right singular vectors/values can be

computed faster using Lanczos/Arnoldi methods.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Singular Value Decomposition

x1 x2 xn u1 uk s1

sk

vT1

vTk

Best rank-k approximation in Frobenius norm

Exact computation of SVD takes O(min(dn2,d2n)) time.

The top k left/right singular vectors/values can be

computed faster using Lanczos/Arnoldi methods.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 9: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Singular Value Decomposition

x1 x2 xn u1 uk s1

sk

vT1

vTk

Best rank-k approximation in Frobenius norm

Exact computation of SVD takes O(min(dn2,d2n)) time.

The top k left/right singular vectors/values can be

computed faster using Lanczos/Arnoldi methods.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Latent Semantic Analysis

k-dimensional semantic structure

Similarity on reduced-space: D-D, D-T, T-T

Folding-in queries: q̂ = S"1k Vkq

X

Do

cu

me

nts

Terms LSA TermVectors

Uk

SingularValues

Sk

LSA DocumentVectors

VTk

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 10: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Principal Component Analysis

Find a projection vector u ! Rd!1, such that the projected

data points Y = uTX own the largest variance, i.e., we

should solve the following optimization problem

maxu uT1

n

!n"

i=1

(xi $m)(xi $m)T#

u

s.t . %u%2 = 1 (1)

From the standard theorem of Rayleigh-Ritz, we know that

the optimal u is the eigenvector of the data covariance

matrix C corresponding to its largest eigenvalue.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Principal Component Analysis

Find a projection vector u ! Rd!1, such that the projected

data points Y = uTX own the largest variance, i.e., we

should solve the following optimization problem

maxu uT1

n

!n"

i=1

(xi $m)(xi $m)T#

u

s.t . %u%2 = 1 (1)

From the standard theorem of Rayleigh-Ritz, we know that

the optimal u is the eigenvector of the data covariance

matrix C corresponding to its largest eigenvalue.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 11: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

PCA & SVD

If X is centralized, then the covariance matrix C = 1nXX

T

Eigenvalue decomposition C = U!UT = 1nXX

T

SVD of X: X = USVT1nXXT = 1

nUSV

TVSU

T = U 1nS2UT

Let ! = 1nS

2, then PCA=SVD

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

PCA & SVD

If X is centralized, then the covariance matrix C = 1nXX

T

Eigenvalue decomposition C = U!UT = 1nXX

T

SVD of X: X = USVT1nXXT = 1

nUSV

TVSU

T = U 1nS2UT

Let ! = 1nS

2, then PCA=SVD

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 12: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

PCA & SVD

If X is centralized, then the covariance matrix C = 1nXX

T

Eigenvalue decomposition C = U!UT = 1nXX

T

SVD of X: X = USVT1nXXT = 1

nUSV

TVSU

T = U 1nS2UT

Let ! = 1nS

2, then PCA=SVD

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

PCA & SVD

If X is centralized, then the covariance matrix C = 1nXX

T

Eigenvalue decomposition C = U!UT = 1nXX

T

SVD of X: X = USVT1nXXT = 1

nUSV

TVSU

T = U 1nS2UT

Let ! = 1nS

2, then PCA=SVD

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 13: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

PCA & SVD

If X is centralized, then the covariance matrix C = 1nXX

T

Eigenvalue decomposition C = U!UT = 1nXX

T

SVD of X: X = USVT1nXXT = 1

nUSV

TVSU

T = U 1nS2UT

Let ! = 1nS

2, then PCA=SVD

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

PCA & SVD

If X is centralized, then the covariance matrix C = 1nXX

T

Eigenvalue decomposition C = U!UT = 1nXX

T

SVD of X: X = USVT1nXXT = 1

nUSV

TVSU

T = U 1nS2UT

Let ! = 1nS

2, then PCA=SVD

First PCSecond PC

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 14: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Nonlinear Embedding

PCA is a linear method to project the data points

What should we do if the data are nonlinearly distributed?

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Manifold & Graph

We usually assume that the high-dimensional data points

reside (nearly) on a low-dimensional nonlinear manifold

Find the low-dimensional embeddings of the data which

preserve the graph structure

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 15: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Manifold & Graph

We usually assume that the high-dimensional data points

reside (nearly) on a low-dimensional nonlinear manifold

Find the low-dimensional embeddings of the data which

preserve the graph structure

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Local Linear Embedding (LLE)

Assume each data point can be linearly reconstructed from

its neighborhood, i.e., for each xi , we minimize

!i ="

xj#Ni

%xi $ wijxj%2

s.t ."

jwij = 1 (2)

Then we use all {wij} to recover the low-dimensional

embedding of the data points Y by solving

J ="n

i=1%yi $

"yj#Ni

wijyj%2

s.t . YTY = I (3)Y = [y1, · · · , yn] is the low-dimensional embedded datamatrix

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 16: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

An Example of LLE

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Laplacian Eigenmaps (LE)

The embedded data should be sufficiently smooth with

respect to the intrinsic data manifold.

We minimize

minY"

i$jwij%yi $ yj%2

s.t . YTY = I (4)

wij represents the similarity between xi and xj

Writing in matrix form$

i$j wij%yi $ yj%2 = tr(Y(D$W)YT )

W(i, j) = wij is the similarity matrix

D = diag($

j w1j , · · · ,$

j w2j)

We call L = D$W the Laplacian matrix

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 17: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Laplacian Eigenmaps (LE)

The embedded data should be sufficiently smooth with

respect to the intrinsic data manifold.

We minimize

minY"

i$jwij%yi $ yj%2

s.t . YTY = I (4)

wij represents the similarity between xi and xj

Writing in matrix form$

i$j wij%yi $ yj%2 = tr(Y(D$W)YT )

W(i, j) = wij is the similarity matrix

D = diag($

j w1j , · · · ,$

j w2j)

We call L = D$W the Laplacian matrix

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Laplacian Eigenmaps (LE)

The embedded data should be sufficiently smooth with

respect to the intrinsic data manifold.

We minimize

minY"

i$jwij%yi $ yj%2

s.t . YTY = I (4)

wij represents the similarity between xi and xj

Writing in matrix form$

i$j wij%yi $ yj%2 = tr(Y(D$W)YT )

W(i, j) = wij is the similarity matrix

D = diag($

j w1j , · · · ,$

j w2j)

We call L = D$W the Laplacian matrix

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 18: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Graph Similarities

Node similarities: sij = exp%$%xi"xj%2

2!2

&

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Locality Preserving Projections (LPP)

Linear version of Laplacian embedding

Let P be the projection matrix, then the goal of LPP is just

to solve the following problem

minP tr(PTXLXTP)

s.t . PTP = I (5)

Locality Preserving Indexing

Laplacianface

...

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 19: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Graph Embedding: A General Framework

A general graph embedding framework:

miny"

i$jpij%yi $ yj%2

s.t . yTAy = c (6)

i & j denotes that there is an edge connecting xi and xjc is a constant

Linearization:

minp"

i$jpij%pTxi $ pTxj%2

s.t . pTXAXTp = c (7)

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Graph Embedding: A General Framework

A general graph embedding framework:

miny"

i$jpij%yi $ yj%2

s.t . yTAy = c (6)

i & j denotes that there is an edge connecting xi and xjc is a constant

Linearization:

minp"

i$jpij%pTxi $ pTxj%2

s.t . pTXAXTp = c (7)

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 20: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Summarization of Different Methods from a GE

Perspective (Shuicheng Yan et al. CVPR’05)

Algorithm P A

PCA pij = 1/n, 'i (= j A = I

LDA pij = "l1,lj/nli A = I$ eeT

LLE P = W+WT $WTW A = I

LPP pij = exp($%xi $ xj%2/(2#2)) A = D

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Table of Contents

1 Graphs and Matrices are Everywhere

2 Unsupervised Learning with Graphs & Matrices

Dimensionality Reduction

Clustering

Co-Clustering

3 Semi-supervised Learning with Graphs & Matrices

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

4 Future Research Directions

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 21: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

K-means

The data points X comes from C clusters. We aim to find

the cluster centers {fi}Ci=1 together with the clusters such

that the following criterion is minimized

min

C"

i=1

"

xj#"i

%xj $ fi%2 (8)

$i denotes the i-th cluster

We can resort to iterative procedures to solve the problem.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

K-means Procedure

The figures come from

http://www.ccs.neu.edu/home/rjw/csg220/lectures/k-means.pdf.Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 22: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Graph Clustering

Partition the nodes V in graph G into disjoint clusters

Cut: Set of edges with points belonging to different clusters

Association: Set of edges with points belonging to the

same cluster

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Graph Cut Criteria

MinCut: Minimize the association between groups

min cut(A,B)Normalized graph cut criterions:

RatioAssociation: maxasso(A,A)

|A| + asso(B,B)|B|

RatioCut: mincut(A,B)

|A| + cut(B,A)|B|

NormalizedCut: mincut(A,B)vol(A) + cut(B,A)

vol(B)

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 23: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Graph Cut Criteria

MinCut: Minimize the association between groups

min cut(A,B)Normalized graph cut criterions:

RatioAssociation: maxasso(A,A)

|A| + asso(B,B)|B|

RatioCut: mincut(A,B)

|A| + cut(B,A)|B|

NormalizedCut: mincut(A,B)vol(A) + cut(B,A)

vol(B)

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Some Definitions on Graphs

Weight Matrix W: Wij is the weight on the edge eij

Degree Matrix D: Dii =$

jWij

Partition Matrix P: Pij = 1 if xi belongs to partition j ;

Otherwise Pij = 0

Scaled Partition Matrix 'P: 'Pij = 1/)nj if xi belongs to

partition j , nj is the size of the j-th cluster; Otherwise

P̃ij = 0

The goal of graph clustering is to solve P or 'P

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 24: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Some Definitions on Graphs

Weight Matrix W: Wij is the weight on the edge eij

Degree Matrix D: Dii =$

jWij

Partition Matrix P: Pij = 1 if xi belongs to partition j ;

Otherwise Pij = 0

Scaled Partition Matrix 'P: 'Pij = 1/)nj if xi belongs to

partition j , nj is the size of the j-th cluster; Otherwise

P̃ij = 0

The goal of graph clustering is to solve P or 'P

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Spectral Clustering

The solutions of the above optimization problems can be

finally obtained by spectral analysis of some matrices

Ratio association: Do eigenvalue decomposition toW

Ratio cut: Do eigenvalue decomposition to L = D$WNormalized cut: Do eigenvalue decomposition to

L̂ = I$ D"1/2WD"1/2

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 25: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Spectral Clustering

The solutions of the above optimization problems can be

finally obtained by spectral analysis of some matrices

Ratio association: Do eigenvalue decomposition toW

Ratio cut: Do eigenvalue decomposition to L = D$WNormalized cut: Do eigenvalue decomposition to

L̂ = I$ D"1/2WD"1/2

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Spectral Clustering II

Figure from Shi & Malik. PAMI 2000.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 26: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

The Eigenvectors of The Normalized Laplacian Matrix

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Nonnegative Matrix Factorization

Analyzing nonnegative matrices (document-word matrix,

image matrix...)

For a nonnegative matrix X, we decompose it into two

nonnegative matrices

minF!0,G!0

%X$ FGT%2 (9)

Multiplicative update rule to solve the problem

Fij *$ Fij(XG)ij

(FGTG)ij, Gij *$ Gij

(FTX)ij

(FTFGT )ij

Parts-based representation

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 27: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Nonnegative Matrix Factorization

Analyzing nonnegative matrices (document-word matrix,

image matrix...)

For a nonnegative matrix X, we decompose it into two

nonnegative matrices

minF!0,G!0

%X$ FGT%2 (9)

Multiplicative update rule to solve the problem

Fij *$ Fij(XG)ij

(FGTG)ij, Gij *$ Gij

(FTX)ij

(FTFGT )ij

Parts-based representation

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

NMF: An Illustrative Example

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 28: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Clustering Results on TDT Data

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Clustering Results on Reuters Data

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 29: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

NMF Variants

If the data matrix X has mixed signs, then

Singular Value Decomposition: X± + F±GT±

Semi-NMF: X± + F±GT+

Convex-NMF: X± + X±W+GT+

Kernel-NMF: %(X±) + %(X±)W+GT+

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

The Relationships Between NMF and K-means

K-means objective:

Jkm ="

c

"

xi#"c

%xi $ fc%2 =n"

i=1

c"

c=1

gic%xi $ fc%2

=(((X$ FGT

(((2

F

Cluster center matrix: F = [f1, f2, · · · , fC ] ! Rn!C

G ! Rn!C with Gij = gij , such that gij = 1, if xi ! $j ;Gij = 0,otherwise.

K-means and NMF: the same objective, only differentconstraint

NMF: F , 0, G , 0K-means: Gij ! {0, 1}, G1 = 1

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 30: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

The Relationships Between K-means and PCA

!k =$nk

i=1 %x(k)i $mk%2 = %Xk $mke

T%2

!k = trace)Xk(Ink $ eeT/nk )X

Tk

*

Finally,

! =$C

k=1 !k =$c

k=1

%trace(XTi Xi) $

%eT&nk

&XTk Xk

%eT&nk

&&

Let P̃ = diag(en1&n1

, · · · ,enC&nC

)

Then ! = trace(XTX) $ trace(P̃TXTXP̃) subject to P̃T P̃ = I

Therefore we need to maximize trace(P̃TXTXP̃) and get P̃.

According to the Ky Fan therorem, P̃ is composed of the

eigenvectors of XTX corresponding to its largest C

eigevalues

If X is centralized, then it is equivalent to PCA

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

The Relationships Between K-means and Spectral

Clustering

From last slide we can see that the relaxed solution of

kmeans is equivalent to analyze the eigenstructure of

A = XTX

If we define the similarity matrix W = A, then kmeans is

equivalent to ratio association

Define the weighted kmeans criterion

!̃ =$C

k=1

$xi#"k

wi%xi $mk%2

Using similar derivation procedure, we can derive that

optimizing the above criterion is equivalent to solve

maxP̃trace(P̃TD1/2WD1/2P̃) (10)

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 31: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

The Relationships Between NMF and Spectral

Clustering

Let the normalized similarity matrix be W̃ = D"1/2WD"1/2

Then we have the following theorem

Theorem

Normalized Cut using similarity W̃ is equivalent to the following

symmetric nonnegative matrix factorization

minP̃!0

J = %W̃$ P̃P̃T%2 (11)

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Table of Contents

1 Graphs and Matrices are Everywhere

2 Unsupervised Learning with Graphs & Matrices

Dimensionality Reduction

Clustering

Co-Clustering

3 Semi-supervised Learning with Graphs & Matrices

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

4 Future Research Directions

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 32: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

The Problem

Usually the data we face with are relational, i.e., there are

multiple type of data interrelated with each other

How to cluster those relational data simultaneously?

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

The Problem

Usually the data we face with are relational, i.e., there are

multiple type of data interrelated with each other

How to cluster those relational data simultaneously?

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 33: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

A Spectral Approach

Define the similarity matrix on the bi-partite graph

A =

+0 R

RT 0

,

Also the concatenated cluster membership vector

x = [xTI ,xTII ]T

Then the co-clustering problem becomes a graph-cut

problem on the bi-partite graph, i.e., we should solve the

following generalized eigenvalue decomposition problem

Lx = &Dx (12)

where D = diag($

j A1j , · · · ,$

j Anj ), L = D$ A

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

A Spectral Approach

Define the similarity matrix on the bi-partite graph

A =

+0 R

RT 0

,

Also the concatenated cluster membership vector

x = [xTI ,xTII ]T

Then the co-clustering problem becomes a graph-cut

problem on the bi-partite graph, i.e., we should solve the

following generalized eigenvalue decomposition problem

Lx = &Dx (12)

where D = diag($

j A1j , · · · ,$

j Anj ), L = D$ A

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 34: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

A Spectral Approach

Define the similarity matrix on the bi-partite graph

A =

+0 R

RT 0

,

Also the concatenated cluster membership vector

x = [xTI ,xTII ]T

Then the co-clustering problem becomes a graph-cut

problem on the bi-partite graph, i.e., we should solve the

following generalized eigenvalue decomposition problem

Lx = &Dx (12)

where D = diag($

j A1j , · · · ,$

j Anj ), L = D$ A

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Nonnegative Matrix Tri-Factorization

Factorize the user-movie rating matrix X into threematrices F,S,G, such that

F represents the cluster memberships on the user sideG represents the cluster memberships on the movie side

By relaxing the integer constrains on F,G, we need to

solve the following optimization problem

minF'0,S'0,G'0

%X$ FSGT%2, s.t . FTF = I, GGT = I (13)

We can derive some multiplicative update rules to solve for

the optimal F,S,G

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 35: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

An Example of NMTF

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Dimensionality Reduction

Clustering

Co-Clustering

Other Types of Co-Clustering Methods

Information-Theoretic Co-clustering (Dhillon et al. KDD’03)

Bayesian Co-Clustering (Shan & Banerjee. ICDM’08)

Tensor Method (Banerjee et al. SDM’07)

Collective Factorization on Related Matrices (Long et al.

ICML’06)

Multiple Latent Semantic Analysis (Wang et al. SIGIR’06)

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 36: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Why Semi-supervised Learning

Traditional learning problems

Supervised learning: learning with labeled data set

Unsupervised learning: learning with unlabeled data set

Problems

Supervised learning: requires much human effort,

expensive and time consuming

Unsupervised learning: unreliable

Semi-supervised learning

Learning with partially labeled data

Learning with side-information

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Why Semi-supervised Learning

Traditional learning problems

Supervised learning: learning with labeled data set

Unsupervised learning: learning with unlabeled data set

Problems

Supervised learning: requires much human effort,

expensive and time consuming

Unsupervised learning: unreliable

Semi-supervised learning

Learning with partially labeled data

Learning with side-information

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 37: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Why Semi-supervised Learning

Traditional learning problems

Supervised learning: learning with labeled data set

Unsupervised learning: learning with unlabeled data set

Problems

Supervised learning: requires much human effort,

expensive and time consuming

Unsupervised learning: unreliable

Semi-supervised learning

Learning with partially labeled data

Learning with side-information

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

The Similarity Between SSL and Ranking

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 38: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

The Similarity Between SSL and Collaborative

Filtering

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Table of Contents

1 Graphs and Matrices are Everywhere

2 Unsupervised Learning with Graphs & Matrices

Dimensionality Reduction

Clustering

Co-Clustering

3 Semi-supervised Learning with Graphs & Matrices

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

4 Future Research Directions

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 39: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Semi-supervised Assumption

Smoothness Assumption: If two points x1,x2 in a

high-density region are close, then so should be the

corresponding outputs y1, y2

Cluster Assumption: If points are in the same cluster, they

are likely to be of the same class

Manifold Assumption: The (high-dimensional) data lie

(roughly) on a low-dimensional manifold

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Label Propagation

Connect the data points that are close to each other

(Nearest Neighbor Graph)

Propagate the class labels over the connected graph

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 40: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Propagation Rules

Initial label vector: y ! Rn!1

yi = ti if xi is labeled as ti ; yi = 0 if xi is unlabeled

f(1)i = yi if xi is labeled; f

(1)i = '

$xj#Ni

Pijyj otherwise

P ! Rn!n is the propagation matrixMatrix form: f(1) = y+ 'Py

f(2) = f(1) + 'Pf(1) = (I+ 'P+ '2P2)y

Finally f(() =$(

i=0 'iPiy = (I$ 'P)"1y

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Propagation Rules

Initial label vector: y ! Rn!1

yi = ti if xi is labeled as ti ; yi = 0 if xi is unlabeled

f(1)i = yi if xi is labeled; f

(1)i = '

$xj#Ni

Pijyj otherwise

P ! Rn!n is the propagation matrixMatrix form: f(1) = y+ 'Py

f(2) = f(1) + 'Pf(1) = (I+ 'P+ '2P2)y

Finally f(() =$(

i=0 'iPiy = (I$ 'P)"1y

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 41: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Propagation Rules

Initial label vector: y ! Rn!1

yi = ti if xi is labeled as ti ; yi = 0 if xi is unlabeled

f(1)i = yi if xi is labeled; f

(1)i = '

$xj#Ni

Pijyj otherwise

P ! Rn!n is the propagation matrixMatrix form: f(1) = y+ 'Py

f(2) = f(1) + 'Pf(1) = (I+ 'P+ '2P2)y

Finally f(() =$(

i=0 'iPiy = (I$ 'P)"1y

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Propagation Rules

Initial label vector: y ! Rn!1

yi = ti if xi is labeled as ti ; yi = 0 if xi is unlabeled

f(1)i = yi if xi is labeled; f

(1)i = '

$xj#Ni

Pijyj otherwise

P ! Rn!n is the propagation matrixMatrix form: f(1) = y+ 'Py

f(2) = f(1) + 'Pf(1) = (I+ 'P+ '2P2)y

Finally f(() =$(

i=0 'iPiy = (I$ 'P)"1y

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 42: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

An Example

!1.5 !1 !0.5 0 0.5 1 1.5 2 2.5!0.8

!0.6

!0.4

!0.2

0

0.2

0.4

0.6

0.8

1

1.2

!1.5 !1 !0.5 0 0.5 1 1.5 2 2.5!0.8

!0.6

!0.4

!0.2

0

0.2

0.4

0.6

0.8

1

1.2

!1.5 !1 !0.5 0 0.5 1 1.5 2 2.5!0.8

!0.6

!0.4

!0.2

0

0.2

0.4

0.6

0.8

1

1.2

!1.5 !1 !0.5 0 0.5 1 1.5 2 2.5!0.8

!0.6

!0.4

!0.2

0

0.2

0.4

0.6

0.8

1

1.2

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

The Calculation of P

Asymmetrically Normalized Similarity Matrix:

P = D"1W

Symmetrically Normalized Similarity Matrix:

P = D"1/2WD"1/2

How to determine the optimal # when computing Wij?

Linear Neighborhood Similarity

minWij%xi $

"

xj#Ni

Wijxj%2

s.t ."

j

Wij = 1, Wij , 0

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 43: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

The Calculation of P

Asymmetrically Normalized Similarity Matrix:

P = D"1W

Symmetrically Normalized Similarity Matrix:

P = D"1/2WD"1/2

How to determine the optimal # when computing Wij?

Linear Neighborhood Similarity

minWij%xi $

"

xj#Ni

Wijxj%2

s.t ."

j

Wij = 1, Wij , 0

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

The Calculation of P

Asymmetrically Normalized Similarity Matrix:

P = D"1W

Symmetrically Normalized Similarity Matrix:

P = D"1/2WD"1/2

How to determine the optimal # when computing Wij?

Linear Neighborhood Similarity

minWij%xi $

"

xj#Ni

Wijxj%2

s.t ."

j

Wij = 1, Wij , 0

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 44: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

A Regularization Framework

Label consistency: the predicted labels should be

sufficiently close to the initial labels on the labeled data

points

Label smoothness: the predicted labels should be

sufficiently smooth with respect to the data manifold

(graph)

minf

l"

i=1

(fi $ ti)2 +

n"

i=l+1

f 2i + µ

"

i$jwij(fi $ fj)

2

The first term reflects label consistencyThe second term guarantees the predicted label values

should fall in a reasonable range for numerical stabilityThe third term reflects label smoothness

f = (I+ µL)"1y

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

A Regularization Framework

Label consistency: the predicted labels should be

sufficiently close to the initial labels on the labeled data

points

Label smoothness: the predicted labels should be

sufficiently smooth with respect to the data manifold

(graph)

minf

l"

i=1

(fi $ ti)2 +

n"

i=l+1

f 2i + µ

"

i$jwij(fi $ fj)

2

The first term reflects label consistencyThe second term guarantees the predicted label values

should fall in a reasonable range for numerical stabilityThe third term reflects label smoothness

f = (I+ µL)"1y

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 45: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Experimental Results on 20Newsgroup Data

autos, motorcycles, baseball, and hockey under rec

4 10 15 20 25 30 35 40 45 500.2

0.3

0.4

0.5

0.6

0.7

0.8

Classification Results on 20 Newsgroup Data

(a)

LNPConsistencyGaussian FieldsSVMNN

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Table of Contents

1 Graphs and Matrices are Everywhere

2 Unsupervised Learning with Graphs & Matrices

Dimensionality Reduction

Clustering

Co-Clustering

3 Semi-supervised Learning with Graphs & Matrices

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

4 Future Research Directions

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 46: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

What is Side-Information

Types of side-information

Must-link: a pair of points should belong to the same classCannot-link: a pair of points should not appear in the same

class

Side-information is a type of prior knowledge weaker thanpartial labeling

Knowing the partial labeling, we can transform it into

side-informationBut not vice versa

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Pairwise Constrained K-means Clustering

K-means objective: Jkm =$

c

$xi#"c %xi $ fc%

2

Matrix form: Jkm =(((X$ FGT

(((2

F

Cluster center matrix: F = [f1, f2, · · · , fC ] ! Rn!C

G ! Rn!C with Gij = 1, if xi ! $j ; Gij = 0,otherwise.

The objective of PCKM

J($) ="

c

"

xi#"c

%xi $ fc%2 +"

xi ,xj!M

s.t. li "=lj

(ij +"

xi ,xj!C

s.t. li=lj

(̃ij ,

{(ij ! 0}: penalties for violating the must-link constraints

{(̃ij ! 0}: penalties for violating the cannot-link constraints

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 47: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Pairwise Constrained K-means Clustering

K-means objective: Jkm =$

c

$xi#"c %xi $ fc%

2

Matrix form: Jkm =(((X$ FGT

(((2

F

Cluster center matrix: F = [f1, f2, · · · , fC ] ! Rn!C

G ! Rn!C with Gij = 1, if xi ! $j ; Gij = 0,otherwise.

The objective of PCKM

J($) ="

c

"

xi#"c

%xi $ fc%2 +"

xi ,xj!M

s.t. li "=lj

(ij +"

xi ,xj!C

s.t. li=lj

(̃ij ,

{(ij ! 0}: penalties for violating the must-link constraints

{(̃ij ! 0}: penalties for violating the cannot-link constraints

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Penalized Matrix Factorization

Changing the penalties of violations in the constraints in Minto the awards as

J($) ="

c

"

xi"!c

%xi $ fc%2 $"

xi ,xj!M

s.t. li=lj

(ij +"

xi ,xj!C

s.t. li=lj

(̃ij

="

c

"

xi

Gic%xi $ fc%2 +"

c

"

i,j

GicGjc!ij

!ij =

-.

/

(̃ij , (xi , xj) ! C$(ij , (xi , xj) ! M0, otherwise

Penalized matrix factorization objective

minF,G J($) =(((X$ FGT

(((2

F+ tr(GT

"G)

s.t . G ! 0 (14)

" ! Rn!n with its (i, j)-th entry "ij = !ij

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 48: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Penalized Matrix Factorization

Changing the penalties of violations in the constraints in Minto the awards as

J($) ="

c

"

xi"!c

%xi $ fc%2 $"

xi ,xj!M

s.t. li=lj

(ij +"

xi ,xj!C

s.t. li=lj

(̃ij

="

c

"

xi

Gic%xi $ fc%2 +"

c

"

i,j

GicGjc!ij

!ij =

-.

/

(̃ij , (xi , xj) ! C$(ij , (xi , xj) ! M0, otherwise

Penalized matrix factorization objective

minF,G J($) =(((X$ FGT

(((2

F+ tr(GT

"G)

s.t . G ! 0 (14)

" ! Rn!n with its (i, j)-th entry "ij = !ij

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Updating Rules for PMF

Table: Penalized Matrix Factorization for Constrained Clustering

Inputs: Data matrix X, Constraints matrix ".

Outputs: F, G.

1. Initialize G;

2. Repeat the following steps until convergence:

(a). Fixing G, updating F by F = XG(GTG)"1;

(b). Fixing F, updating G by

Gij *$ Gij

0112(XT F)+

ij+[G(FTF)#]

ij+

%"

#G

&

ij

(XT F)#ij

+[G(FTF)+]ij+

%"

+G

&

ij

.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 49: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Side-Information on Bi-partite Graph

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

PMF on Bi-partite Graph

minG1!0,G2!0

J = %R12 $G1SGT2 %2 + tr(GT

1 "1G1) + tr(GT2 "2G2)

Table: PMF on Bi-partite GraphInputs: Relation matrix R12, Constraints matrices "1, "2.Outputs: G1, S, G2.1. Initialize G1, G2;2. Repeat the following steps until convergence:

(a). Fixing G1,G2, updating S using

S)" (GT1G1)

#1GT1R12G2(G

T2G2)

#1;(b). Fixing S,G2, updating G1 using

G1ij ) G1 ij

3(R12G2S

T )+ij

+[G1(STGT2G2S)"]ij+(""

1G1)ij

(R12G2ST )"

ij+[G1(STGT

2G2S)+]ij+("+

1 G1)ij;

(c). Fixing G1,S, updating G2 using

G2ij ) G2 ij

3(RT

12G1S)+

ij+[G2(SGT

1G1S

T )"]ij+(""

2G2)ij

(RT12G1S)"

ij+[G2(SGT

1G1S

T )+]ij+("+2 G2)ij

.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 50: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Semi-supervised Learning with Partially Labeled Data

Semi-supervised Learning Using Side-Information

Table: The F measure of three algorithms on the BBS data set

Data Sets Algorithm d = 3 d = 4 d = 5 d = 6

1 MLSA 0.7019 0.7079 0.7549 0.7541

1 SRC 0.7281 0.6878 0.6183 0.6183

1 Tri-SPMF 0.7948 0.8011 0.8021 0.79932 MLSA 0.7651 0.7429 0.7581 0.7309

2 SRC 0.7627 0.7226 0.7280 0.6965

2 Tri-SPMF 0.8007 0.7984 0.7938 0.78963 MLSA 0.6689 0.6511 0.6987 0.7301

3 SRC 0.7556 0.7666 0.7472 0.7125

3 Tri-SPMF 0.8095 0.8034 0.7993 0.7874

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Tensor & Hypergraph Based Methods

In knowledge & information management, we usually facewith multi-relational data

Graph based methods can capture the pairwise

relationships

Matrix is also only composed of two dimensions

Hypergraph is more efficient in describing the multiple-wise

relationships

Tensor is also a structure that can capture multiple-wise

relationships

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 51: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Efficient & Large Scale Methods

Matrix & Graph based methods usually involve highcomputational cost

eigenvalue decompositionsolving large scale linear equation systems

constrained optimization

How to make the algorithm more efficient?

Exploring the sparsity

How to improve scalability?

Smart sampling

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Efficient & Large Scale Methods

Matrix & Graph based methods usually involve highcomputational cost

eigenvalue decompositionsolving large scale linear equation systems

constrained optimization

How to make the algorithm more efficient?

Exploring the sparsity

How to improve scalability?

Smart sampling

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 52: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Efficient & Large Scale Methods

Matrix & Graph based methods usually involve highcomputational cost

eigenvalue decompositionsolving large scale linear equation systems

constrained optimization

How to make the algorithm more efficient?

Exploring the sparsity

How to improve scalability?

Smart sampling

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Efficient & Large Scale Methods

Matrix & Graph based methods usually involve highcomputational cost

eigenvalue decompositionsolving large scale linear equation systems

constrained optimization

How to make the algorithm more efficient?

Exploring the sparsity

How to improve scalability?

Smart sampling

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 53: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Probabilistic Interpretations

Potential problems of describing the data with matrices

Too large

Too complicatedMissing entries

Noisy entries· · · · · ·

Probabilistic interpretations & graphical models

Discover latent structures

Relationships with matrix based methods?

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Knowledge Transfer Across Different Domains

The multi-relational data contain data points from differentdomains

We may easily get some prior knowledge on some domains

How to transfer the knowledge from one domain toanother?

What knowledge to transfer?How?

Is it really helps?

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 54: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Knowledge Transfer Across Different Domains

The multi-relational data contain data points from differentdomains

We may easily get some prior knowledge on some domains

How to transfer the knowledge from one domain toanother?

What knowledge to transfer?How?

Is it really helps?

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Mikhail Belkin, Partha Niyogi. Laplacian eigenmaps and spectral techniques forembedding and clustering. NIPS 2001.

A. Banerjee, S. Basu, S. Merugu. Multi-way Clustering on Relation Graphs. SDM2007.

I. S. Dhillon, Y. Guan, and B. Kulis. Weighted Graph Cuts without Eigenvectors: AMultilevel Approach. PAMI 2007.

I. S. Dhillon, S. Mallela, and D. S. Modha. Information-Theoretic Co-clustering.KDD 2003.

I. S. Dhillon. Co-Clustering Documents and Words Using Bipartite Spectral GraphPartitioning. KDD 2001.

Chris Ding, Xiaofeng He, and Horst D. Simon. On the Equivalence ofNonnegative Matrix Factorization and Spectral Clustering. SDM 2005.

Chris Ding, Tao Li, Wei Peng, Haesun Park. Orthogonal Nonnegative MatrixTri-factorizations for Clustering. KDD 2006.

Chris Ding, Tao Li, Michael I. Jordan. Convex and Semi-Nonnegative MatrixFactorizations. Technical Report. 2006.

Chris Ding, Rong Jin, Tao Li, and Horst D. Simon. A Learning Framework UsingGreen’s Function and Kernel Regularization with Application for RecommenderSystem KDD 2007.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 55: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Xiaofei He and Partha Niyogi. Locality Preserving Projections. NIPS 2003.

Xiaofei He, Deng Cai, Haifeng Liu, and Wei-Ying Ma. Locality PreservingIndexing for Document Representation. SIGIR 2004.

D. D. Lee and H. S. Seung. Learning the parts of objects with nonnegative matrixfactorization. Nature 1999.

D. D. Lee and H. S. Seung. Algorithms for Non-negative Matrix Factorization.NIPS 2000.

Tao Li, Chris Ding, Yi Zhang, and Bo Shao. Knowledge Transformation from WordSpace to Document Space. SIGIR 2008.

Tao Li, Chris Ding, and Michael Jordan. Solving Consensus and Semi-supervisedClustering Problems Using Nonnegative Matrix Factorization ICDM 2007.

Tao Li and Chris Ding. The Relationships Among Various Nonnegative MatrixFactorization Methods for Clustering. ICDM 2006.

Sam Roweis and Lawrence Saul. Nonlinear dimensionality reduction by locallylinear embedding. Science 2000.

Hanhuai Shan and Arindam Banerjee. Bayesian Co-clustering. ICDM 2008.

J. Shi and J. Malik. Normalized Cuts and Image Segmentation. PAMI 2000.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Fei Wang, Shouchun Chen, Tao Li, Changshui Zhang. Semi-Supervised MetricLearning by Maximizing Constraint Margin. CIKM 2008.

Fei Wang, Tao Li and Changshui Zhang. Semi-Supervised Clustering via MatrixFactorization. SDM 2008.

Fei Wang, Sheng Ma, Liuzhong Yang, Tao Li. Recommendation on Item Graphs.ICDM 2006.

Fei Wang, Changshui Zhang. Label Propagation Through Linear Neighborhoods.ICML 2006.

X. Wang, J. Sun, Z. Chen, and C. Zhai. Latent semantic analysis for multiple-typeinterrelated data objects. SIGIR 2006.

Wei Xu, Xin Liu, Yihong Gong. Document Clustering Based on Non-negativeMatrix Factorization. SIGIR 2003.

S. Yan, D. Xu, B. Zhang and H. Zhang. Graph Embedding: A General Frameworkfor Dimensionality Reduction. CVPR 2005.

D. Zhou , O. Bousquet, T.N. Lal, J. Weston and B. Schölkopf. Learning with Localand Global Consistency. NIPS 2003.

Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. Semi-supervised learningusing Gaussian fields and harmonic functions. ICML 2003.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Page 56: Data Mining with Graphs and Matrices · Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices Graphs and Matrices are Everywhere Unsupervised Learning with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled datawith label propagation. Technical Report CMU-CALD-02-107, Carnegie MellonUniversity, 2002.

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices

Graphs and Matrices are Everywhere

Unsupervised Learning with Graphs & Matrices

Semi-supervised Learning with Graphs & Matrices

Future Research Directions

Thank You!http://feiwang03.googlepages.com/sdm-tutorial

Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices