8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
1/93
ECML PKDD September ,
The Minimum Code Length for
Clustering Using the Gray Code
Mahito SUGIYAMA,Akihiro YAMAMOTOKyoto University
JSPS Research Fellow
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
2/93
Contributions
. The MCL (Minimum Code Length) A new measure to score clustering results
Needed to distinguish each cluster under some fixed encod-ing scheme for real-valued variables
. COOL (COding-Oriented cLustering) A general clustering approach
Always finds the best clusters (i.e.,the global optimal solution)which minimizes the MCL in O(nd)
Parameter tuning is not needed. G-COOL (COOL with the Gray code)
Achieves internal cohesion and external isolation
Finds arbitrary shaped clusters
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
3/93
Demonstration (Synthetic Dataset)
0
0.5
1
0 0.5 1
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
4/93
G-COOL
0
0.5
1
0 0.5 1
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
5/93
K-means
0
0.5
1
0 0.5 1
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
6/93
Results (Real datasets)
Binary
ltering
Originalimage
Delta Dragon Europe Norway Ganges
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
7/93
Results (Real datasets)
G-C
OOL
K-m
eans
Delta Dragon Europe Norway Ganges
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
8/93
Outline
. Overview. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
9/93
Outline
. Overview. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
10/93
Clustering Focusing on Compression
The MDL approach [Kontkanen et al., ] Data encoding has to be optimized
All encoding schemes are (implicitly) considered The time complexity O(n2)
The Kolmogorov complexity approach [Cilibrasi, ] Measures the distance between data points based on com-
pression of finite sequences
Difficult to apply multivariate data Actual clustering process is the traditional agglomerative hi-erarchical clustering
The time complexity O(n2) Both approaches are not suitable for massive data
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
11/93
Our Strategy
Requirements:. Fast, and linear in the data size. Robust to changes in input parameters. Can find arbitrary shaped clusters
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
12/93
Our Strategy
Requirements:. Fast, and linear in the data size. Robust to changes in input parameters. Can find arbitrary shaped clusters
Solutions:. Fix an encoding scheme for continuous variables
Motivated by Computable Analysis [Weihrauch, ]
. Clustering = Discretizing real-valued data
Always finds the best results w.r.t. the MCL. Use the Gray code for real numbers [Tsuiki, ]
Discretized data points are overlapped and adjacent clus-ters are merged
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
13/93
Outline
. Overview. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
14/93
MCL (Minimum Code Length)
The MCL is the code length of the maximally compressedclusters by using a fixed encoding scheme
The MCL is calculated in O(nd) by using radix sort n and dare the number of data and dimension, resp.
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
15/93
MCL (Minimum Code Length)
The MCL is the code length of the maximally compressedclusters by using a fixed encoding scheme
The MCL is calculated in O(nd) by using radix sort n and dare the number of data and dimension, resp.
Example: X= {0.1, 0.2, 0.8, 0.9},
1 = {{0.1, 0.2}, {0.8, 0.9}}2 = {{0.1}, {0.2, 0.8}, {0.9}}
Use binary encoding Which is preferred?
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
16/93
Binary Encoding
0 1
Position
0
1
2
3
4
0.50.1 0.2
00011...
00110...
0.8 0.9
11001...
11100...
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
17/93
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
id value
A .B .C .D .
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
18/93
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
id value
A .B .C .D .
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
19/93
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
id value
A .B .C .D .
MCL = 1 + 1 = 2
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
20/93
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
id value
A .B .C .D .
MCL = 1 + 1 = 2
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
21/93
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
id value
A .B .C .D .
MCL = 1 + 1 = 2
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
22/93
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
id value
A .B .C .D .
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
23/93
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
00 10 1101Lv. 2
Lv. 3000 001 110 111
id value
A .B .C .D .
MCL = 3 4 = 12
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
24/93
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
00 10 1101Lv. 2
Lv. 3000 001 110 111
id value
A .B .C .D .
MCL = 3 4 = 12
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
25/93
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
00 10 1101Lv. 2
Lv. 3000 001 110 111
id value
A .B .C .D .
MCL = 3 4 = 12
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
26/93
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
00 10 1101Lv. 2
Lv. 3000 001 110 111
id value
A .B .C .D .
MCL = 3 4 = 12
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
27/93
Definition of MCL
Fix an embedding :
d
( = {0
,1
} usually) For p range() and P range(), define
(p P) {w p wand P v= for all vsuch that |v| = |w| and p v }
Each element in (p P) is a prefix that discriminates p from PFor a partition = {C1, , CK} of a data set X,MCL() i{1,,K} Li(), where
Li() min{|W| (Ci) WandWxCi((x) (X Ci)) }
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
28/93
Minimizing MCL and Clustering
Clustering under the MCL criterion istofindthe global op-timal solution that minimizes the MCL
Findop such thatop argmin
(X)K
MCL(),
where(X)K = { is a partition ofX #C K}
We give the lower bound of the number of clusters Kas a
input parameter op becomes one set {X} without this assumption
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
29/93
Outline
. Overview. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
30/93
Optimization by COOL
COOL solves the optimization problem in O(nd) n and dare the number of data and dimension, resp. The nave approach takes exponential time and space
Computing process of the MCL becomes clustering process it-self via discretization
COOL is level-wise, and makes the level-k partition kfrom k= 1, 2, , which holds the following condition: For all x,y X, they are in the same cluster
v= wfor some v (x) and w (y) with |v| = |w| = k Level-kpartitions form hierarchy For C k, there exists k+1 such that = C
For all C op, there exists ksuch that C k
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
31/93
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D Eid value
A .
B .C .D .E .
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
32/93
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
id value
A .
B .C .D .E .
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
33/93
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
id value
A 0
B 0C 1D 1E 1
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
34/93
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
id value
A 0
B 0C 1D 1E 1
MCL = 1 + 1 = 2
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
35/93
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
00 10 1101Lv. 2
id value
A 00
B 01C 10D 10E 11
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
36/93
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
00 10 1101Lv. 2
id value
A 00
B 01C 10D 10E 11
MCL = 2 4 = 8
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
37/93
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
00 10 1101Lv. 2
100 101Lv. 3
id value
A 00
B 01C 100D 101E 11
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
38/93
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
00 10 1101Lv. 2
100 101Lv. 3
id value
A 00
B 01C 100D 101E 11
MCL = 6 + 6 = 12
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
39/93
Noise Filtering by COOL
Noise filtering is easily implemented in COOL
DefineN {C #C N} for a partition See a cluster Cas noises if #C< N
Example: Given = {{0.1}, {0.4, 0.5, 0.6}, {0.9}}
2 = {{0.4, 0.5, 0.6}}, and 0.1 and 0.9 are noises We input the lower bound Nof the cluster size as a input
parameter
0 0.5 10.25 0.75
N= 2
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
40/93
Noise Filtering by COOL
Noise filtering is easily implemented in COOL
DefineN {C #C N} for a partition See a cluster Cas noises if #C< N
Example: Given = {{0.1}, {0.4, 0.5, 0.6}, {0.9}}
2 = {{0.4, 0.5, 0.6}}, and 0.1 and 0.9 are noises We input the lower bound Nof the cluster size as a input
parameter
0 0.5 10.25 0.75
N= 2
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
41/93
Algorithm of COOL
Input: A data set X, two lower bounds Kand NOutput: The optimal partitionop and noisesfunction COOL(X, K, N): Find partitions1N, ,mN such that m1N < K mN: (op, MCL(op)) FINDCLUSTERS(X, K, {1N, ,mN}): return (
op, Xop)
function FINDCLUSTERS(X, K, {1, ,m}): Find ksuch that k1 < Kand k K: op k: ifK= 2 then return (op, MCL(op)): for each Cin
1
k1
: (, L) FINDCLUSTERS(X C, K 1, {1, ,k}): ifMCL( C) < MCL(op) thenop C : return (op, MCL(op))
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
42/93
Outline
. Overview. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
43/93
Gray Code
Real numbers in [0, 1] are encoded with
0
,1
, andBinary: 0.1 00011 ,0.25 00111Gray: 0.1 00010 ,0.25 0100
Originally, another binary encoding of natural numbers
Especially important in applications of conversion betweenanalog and digital information [Knuth, ]
The Gray code embedding is an injection G that mapsx [0,1]to an infinite sequence p0p1p2 , where pi 1 if 2
i
m2(i+1) < x< 2
i
m + 2(i+1) for an odd m,pi 0if the same holds for an even m,andpi ifx= 2im2(i+1)
for some integer m For a vector x= (x1, ,xd), G(x) = p11pd1p12pd2
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
44/93
Gray Code Embedding
0 0.5 1
0
1
2
35
Po
sition
0.8
10101...
0.25
0100...
0.1
00010...
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
45/93
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D Eid value
A .
B .C .D .E .
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
46/93
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
1
id value
A .
B .C .D .E .
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
47/93
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
1
id value
A 0
B 0, 1C 1, 1D 1, 1E 1
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
48/93
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
1
id value
A 0
B 0, 1C 1, 1D 1, 1E 1
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
49/93
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
1
id value
A 0
B 0, 1C 1, 1D 1, 1E 1
MCL = 1 2 = 2
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
50/93
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
100 10 1101Lv. 2
01 10 11
0 1Lv. 1
0 1Lv. 1
id value
A 00
B 01, 10C 10, 10D 10, 11E 11, 11
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
51/93
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
100 10 1101Lv. 2
01 10 11
0 1Lv. 1
0 1Lv. 1
id value
A 00
B 01, 10C 10, 10D 10, 11E 11, 11
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
52/93
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
100 10 1101Lv. 2
01 10 11
0 1Lv. 1
0 1Lv. 1
id value
A 00
B 01, 10C 10, 10D 10, 11E 11, 11
MCL = 2 3 = 6
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
53/93
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
id value
A 0
B 0C 1D 1E 1
MCL = 1 + 1 = 2
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
54/93
Theoretical Analysis of G-COOL
Use the Gray code as a fixed encoding in COOL It achieves internal cohesion and external isolation
Theorem: For the level-kpartition k, x,y Xare in thesame cluster ifd(x,y) < 2(k+1)
Thus x,yare in the different clusters only ifd(x,y) 2(k+1)
d(x,y) = maxi{1,,d}|xiyi| (L metric) Two adjacent intervals overlap and they are agglomerated
Corollary: In the optimal partitionop, for allx C(C
op), its nearest neighbory C yis nearest neighbor ofx y argminyXd(x,y)
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
55/93
Demonstration of G-COOL
0 0.5 1
0
0.5
1
G-COOL
0 0.5 1
0
0.5
1
COOL with the binary encoding
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
56/93
Outline
. Overview
. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
57/93
Experimental Methods
Analyze G-COOL empirically with synthetic and realdatasets compared to DBSCAN and K-means Synthetic datasets were generated by the R package cluster-
Generation [Qiu and Joe, ]
n = 1, 500 for each cluster and d= 3 Real datasets were geospatial images from Earth-as-Art
reduced to pixels, translated into binary images All data were normalized by min-max normalization
G-COOL was implemented by R (version ..)
Internal and External measure were used Internal: MCL, connectivity, Silhouette width
External: adjusted Rand index
/
http://eros.usgs.gov/imagegallery/8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
58/93
Results (Synthetic datasets)
MCL
G-COOL
DBSCAN
K-means
50000
5000
Number of clusters
2 4 6
500
Data show mean s.e.m.
Each experiment was performed20 times
Bad
Good
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
59/93
Results (Synthetic datasets)
Connectivity Silhouette width1000
1
100
Number of clusters2 4 6
0.1
10
0.6
0.3
0.5
Number of clusters2 4 6
0.4
G-COOL
DBSCAN
K-means
Good
Bad
Bad
Good
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
60/93
Results (Synthetic datasets)
Runtime (s)Adjusted Rand index
20
5
15
Number of clusters2 4 6
0
10
1.0
0.3
0.9
Number of clusters2 4 6
0.6
G-COOL
DBSCAN
K-means
Good
Bad
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
61/93
Results (Synthetic datasets)
MCLG-COOL
Data show mean s.e.m.
Each experiment was performed20 times
5000
3000
00 4 62 8 10
4000
2000
1000
The noise parameter N
Bad
Good
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
62/93
Results (Synthetic datasets)
Connectivity Silhouette width
80
20
60
0
40
0.6
0.2
0.4
0 4 62 8 10 0 4 62 8 10
The noise parameter N The noise parameter N
0
Good
Bad
Bad
Good
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
63/93
Results (Synthetic datasets)
Runtime (s)Adjusted Rand index
2.0
0.5
1.5
0 4 60
1.0
1.0
0.4
0.8
0.6
2 8 10The noise parameter N
0 4 62 8 10The noise parameter N
0.2
0
Good
Bad
/
l l d
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
64/93
Results (Real datasets)
Binar
yltering
Originalimage
Delta Dragon Europe Norway Ganges
/
R l (R l d )
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
65/93
Results (Real datasets)
G-COOL
K-means
Delta Dragon Europe Norway Ganges
/
R l (R l d )
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
66/93
Results (Real datasets)
Name n K Running time (s) MCL
GC KM GC KM
Delta . . Dragon . . Europe . . Norway . . Ganges . .
GC: G-COOL, KM: K-means
/
O li
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
67/93
Outline
. Overview
. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
C l i
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
68/93
Conclusion
Integrate clustering and its evaluation in the coding-oriented manner An effective solution for two essential problems, how to mea-
sure goodness of results and how to find good clusters
No distance calculation and no data distribution Key ideas:
. Fixof an encoding scheme for real-valued variables Introduced the MCL focusing on compression of clusters
Formulated clustering with the MCL, and constructed COOLthat finds the global optimal solution linearly
. The Gray code We showed efficiency and effectiveness ofG-COOL by the-
oretically and experimentally
/
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
69/93
Appendix
A-/A-
N t ti ( / )
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
70/93
Notation (/)
A datum x d, a data set X= {x1, ,xn} #Xis the number of elements in X X Yis the relative complement ofYin X
Clustering is partition ofXinto Ksubsets (clusters) C1, , CK Ci and Ci Cj = We call = {C
1, , CK} a partition ofX
(X) = { is a partition ofX} The set of finite and infinite sequences over an alphabet are
denoted by and , resp. The length
|w
|is the number of symbols other than
Ifw= 11100 , then |w| = 5 For a set of sequences W, |W| = wW|w|
A-/A-
N t ti ( / )
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
71/93
Notation (/)
An embedding ofd is an injective function fromd to For p, q
, define p q ifpi = qi for all iwith pi Intuitively, q is more concrete than p
For w , we write w p ifw p w= {p range() w p} for w W= {p range() w p for some w W} for W
The following monotonicity holds
1(v) 1(w) iffv w
A-/A-
O ti i ti b COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
72/93
Optimization by COOL
The optimal partitionop can be constructed by the level-kpartitions
For all C op, there exists ksuch that C k
The level-kpartitions have the hierarchical structure For each C k we have = Cfor some D k+1 COOL is similar to divisive hierarchical clustering
COOL always outputs the global optimal partitionop
The time complexity is O(nd) (best) and O(nd+ K! ) (worst) Usually K n holds, hence O(nd)
A-/A-
Cl t i P f COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
73/93
Clustering Process of COOL
0 1
0
1
A-/A-
Cl t i P f COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
74/93
Clustering Process of COOL
0 1
0
1 K= 2
A-/A-
Cl t i P f COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
75/93
Clustering Process of COOL
0 1
0
1 K= 2
A-/A-
Clustering Process of COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
76/93
Clustering Process of COOL
0 10
1 K= 2
Lv. 1
Cluster
1 2 31
2 3
A-/A-
Clustering Process of COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
77/93
Clustering Process of COOL
0 10
1 K= 2
Lv. 1
Cluster
1 2 31
2 3
A-/A-
Clustering Process of COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
78/93
Clustering Process of COOL
0 10
1
Lv. 1
Cluster
1 2 31
2 3K= 5
A-/A-
Clustering Process of COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
79/93
Clustering Process of COOL
0 10
1
Lv. 1
Cluster
1 2 31
2 3K= 5
Lv. 2
A-/A-
Clustering Process of COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
80/93
Clustering Process of COOL
0 10
1
Lv. 1
ClusterK= 5
Lv. 21 2
3
5
4 6
21 3 4 5 6
A-/A-
Clustering Process of COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
81/93
Clustering Process of COOL
0 10
1
Lv. 1
ClusterK= 5
Lv. 2
A-/A-
Clustering Process of COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
82/93
Clustering Process of COOL
0 10
1
Lv. 1
ClusterK= 5
Lv. 2
A-/A-
Clustering Process of COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
83/93
Clustering Process of COOL
0 10
1
Lv. 1
ClusterK= 5
Lv. 2
A-/A-
Clustering Process of COOL
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
84/93
Clustering Process of COOL
0 10
1
Lv. 1
ClusterK= 5
Lv. 2
A-/A-
TheMulti Dimensional Gray Code
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
85/93
TheMulti-Dimensional Gray Code
Usethewrappingfunction (p1, ,pd) p11pd1p12pd2Define the d-dimensional Gray code embedding dG:,d by dG(x1, ,xd) (G(x1), , G(xd))
We abbreviate dofdG if it is understood from the context
A-/A-
Internal Measures
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
86/93
Internal Measures
Connectivity [Handl et al., ]
Conn() = xXMi=1 f(x, nn(x, i))/i
nn(x,j) isthe i-th neighbor ofx, f(x,y) is 0 if xandybe-long to the same cluster, and 1 otherwise M is an input parameter (we set as )
Takes values from 0 to, should be minimized Silhouette width The average of Silhouette value S(x) for each x
S(x) = (b(x) a(x)/ max(b(x), a(x))) a(x) = C
1 yCd(x,y) (x C) b(x) = minDCD1 yD d(x,y) Takes values from 1 to 1, should be maximized
A-/A-
External Measures
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
87/93
External Measures
Adjusted Rand index Let the result be = {C1, , CK} and the correct parti-tion be = {D1, , DM} Suppose nij {x X x Ci,x Dj}. Then
i,j nijC2 (iCiC2 h DjC2)/nC221(iCiC2 + h DjC2) (iCiC2 h DjC2)/nC2
A-/A-
Discussion
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
88/93
Discussion
Results for synthetic datasets Best performance under the internal measures (nearly) Best performance under the internal measures G-COOL is efficient and effective DBSCAN is sensitive to input parameters
The MCL works well as an internal measure
Results for real datasets not good, and not bad
There are no clear clusters originally G-COOL is a good clustering method
A-/A-
RelatedWork
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
89/93
RelatedWork
Partitional methods [Chaoji et al., ]
Mass-based methods [Ting and Wells, ]
Density-based methods (DBSCAN [Ester et al., ]
Hierarchical clustering methods
(CURE [Guha et al., ], CHAMELEON [Karypis et al.,])
Grid-based methods(STING [Wang etal., ], WaveCluster [Sheikholeslami etal.
, ])
A-/A-
Future Works
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
90/93
Future Works
Speeding up by using tree-structures such as BDD
Apply to anomaly detection
Theoretical analysis, in particular relation with Com-putable Analysis Admissibility is a key property
A-/A-
References
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
91/93
References
[Berkhin, ] P. Berkhin. A survey of clustering data mining techniques.Grouping Multidimensional Data, pages , .
[Chaoji et al., ] V. Chaoji, M. A. Hasan, S. Salem, and M. J. Zaki. SPARCL:An effective and efficient algorithm for mining arbitrary shape-basedclusters. Knowledge and Information Systems, ():, .
[Cilibrasi and Vitnyi, ] R. Cilibrasi and P. M. B. Vitnyi. Cluster-ing by compression. IEEE Transactions on Information Theory,():, .
[Ester et al., ] M. Ester, H. P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databaseswith noise. In Proceedings of KDD, , , .
[Guha et al., ] S. Guha, R. Rastogi, and K. Shim. CURE: An effi-cient clustering algorithm for large databases. Information Systems,():, .
[Handl et al., ] J. Handl, J. Knowles, and D. B. Kell. Computational
A-/A-
cluster validation in post-genomic data analysis Bioinformatics
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
92/93
cluster validation in post-genomic data analysis. Bioinformatics,():, .
[Karypis et al., ] G. Karypis, H. Eui-Hong, and V. Kumar. CHAMELEON:
Hierarchical clustering using dynamic modeling. Computer,():, .
[Kontkanen and Myllymki, ] P. Kontkanen and P. Myllymki. An em-pirical comparison of NML clustering algorithms. In Proceedings ofInformation Theory and Statistical Learning, .
[Kontkanen et al., ] P. Kontkanen, P. Myllymki, W. Buntine, J. Rissanen,and H. Tirri. An MDL framework for data clustering. In P. Grnwald, I. J.Myung, and M. Pitt, editors,Advances in Minimum Description Length:Theory and Applications. MIT Press, .
[Knuth, ] D. E. Knuth. The Art of Computer Programming, Volume , Fas-cicle : Generating All Tuples and Permutations. Addison-Wesley Pro-fessional, .
[Qiu and Joe, ] W. Qiu and H. Joe. Generation of random clusters withspecified degree of separation. Journal of Classification, :,.
A-/A-
[Sheikholeslami et al ] G Sheikholeslami S Chatterjee and
8/3/2019 The Minimum Code Length for Clustering Using the Gray Code
93/93
[Sheikholeslami et al., ] G. Sheikholeslami, S. Chatterjee, andA. Zhang. WaveCluster: A multi-resolution clustering approach forvery large spatial databases. In Proceedings of the th InternationalConference on Very Large Data Bases, pages , .
[Ting and Wells, ] K. M. Ting and J. R. Wells. Multi-dimensional massestimation and mass-based clustering. In Proceedings of th IEEE In-ternational Conference on Data Mining, pages , .
[Tsuiki, ] Hideki Tsuiki. Real number computation through Gray codeembedding. Theoretical Computer Science, ():, .
[Wang et al., ] W. Wang, J. Yang, and R. Muntz. STING: A statisticalinformation grid approach to spatial data mining. In Proceedingsof the rd International Conference on Very Large Data Bases, pages, .
[Weihrauch, ] K. Weihrauch. Computable Analysis. Springer, .
Top Related