7/28/2019 New link based approach for categorical data clustering
1/17
NEW LINK BASED APPROACH
FOR CATEGORICAL DATACLUSTERING
By,
CHIRANTH B O4th Sem M.tech
KNOWLEDGE AND DATA ENGINEERING
7/28/2019 New link based approach for categorical data clustering
2/17
July 22, 2013 2
Presentation Outline
Introduction to Clustering
Abstract
Existing System
Proposed System
Experimental DesignExperimental results
Conclusion
7/28/2019 New link based approach for categorical data clustering
3/17
ClusteringIntroduction
ClusteringGrouping similar kind of data.
Data clustering concerns how togroup a set ofobjects based on their
similarity of attributes.
Main methods Partitioning: K-Means
Hierarchical : BIRCH,ROCK,
Density-based: DBSCAN,
A good clustering method will produce high quality clusters with
high intra-class similarity
low inter-class similarity
July 22, 2013 3
7/28/2019 New link based approach for categorical data clustering
4/17
July 22, 2013 4
ABSTRACT
The categorical data clustering methods are generating
results based on incomplete information.
This problem degrades the quality of the clustering result.
This paper presents a new link-based approach for
categorical data clustering which improves results by
discovering unknown entries through similarity betweenclusters
7/28/2019 New link based approach for categorical data clustering
5/17
Existing Methods
K-means cannot cluster the categorical data.
SQUEEZER and CACTUS generates final clustering
using incomplete information.
Many data entries are left unknown.
July 22, 2013 5
7/28/2019 New link based approach for categorical data clustering
6/17
Proposed Methods
Link based approach improves the
matrix by discovering the unknown
entries.
An efficient link based algorithm used tofind similarity between clusters.
July 22, 2013 6
7/28/2019 New link based approach for categorical data clustering
7/17
July 22, 2013 7
Introduction to NLCD
Designed for very large data sets:
Time and memory are limited
Only one scan of data is necessary
Does not need the whole data set in advance
Two key Modules:
Scans the database to build an Binary Matrix.Building refined matrix using Weighted Triple QualityAlgorithm.
7/28/2019 New link based approach for categorical data clustering
8/17
Basic process
July 22, 2013 8
Dataset X
Clustering
1
Consensus
FunctionClustering
2
Clustering
M
7/28/2019 New link based approach for categorical data clustering
9/17
July 22, 2013 9
Binary MatrixPairWise-Similarity Matrix
Clustering
7/28/2019 New link based approach for categorical data clustering
10/17
Weighted Triple Quality
ALGORITHM - WTQ (G, , )
G = (V, W), a weighted graph, where , ;
, a set of adjacent neighbors of ;
= ;
, the WTQ measure of and;
0
For each c
Ifc
+1
Return
Following that, the similarity between clusters and can be estimated by
Sim , =
,
July 22, 2013 10
7/28/2019 New link based approach for categorical data clustering
11/17
July 22, 2013 11
Over Lapping Member
Wx,y W where Cx ,Cy V
Cluster Network
wxy =
,
7/28/2019 New link based approach for categorical data clustering
12/17
July 22, 2013 12
Experimental Results
Input parameters:
Memory (M): 5% of data set
Disk space (R): 20% ofM
Initial threshold (T): 0.0Page size (P): 1024 bytes
7/28/2019 New link based approach for categorical data clustering
13/17
July 22, 2013 13
Experimental Results
KMEANS clustering
No Time D # Scan DS Time D # Scan
1 43.9 2.09 289 1o 33.8 1.97 197
2 13.2 4.43 51 2o 12.7 4.20 293 32.9 3.66 187 3o 36.0 4.35 241
No Time D # Scan DS Time D # Scan
1 11.5 1.87 2 1o 13.6 1.87 2
2 10.7 1.99 2 2o 12.1 1.99 2
3 11.4 3.95 2 3o 12.2 3.99 2
NLCD clustering
7/28/2019 New link based approach for categorical data clustering
14/17
July 22, 2013 14
Conclusions
A New Link Based Clustering that stores the
clustering features in Matrix.
Given a limited amount of main memory, NLCD
can minimize the time required for I/O.
The problem of constructing the refined matrix is
efficiently resolved by similarity among
categorical clusters
7/28/2019 New link based approach for categorical data clustering
15/17
Future Work
The first prominent future work includes an
extensive study regarding the behavior of other
link-based similarity measures within thisproblem context.
The second prominent future work is the new
method will be applied to specific domains,
including tourism and medical data sets.
July 22, 2013 15
7/28/2019 New link based approach for categorical data clustering
16/17
References
IEEE Journal on Data Mining
http://ilpubs.stanford.edu:8090/508/1/2001-41.pdf
IEEE Journal on Knowledge and data engineering
http://en.wikipedia.org/wiki/Clustering_algorithm
July 22, 2013 16
7/28/2019 New link based approach for categorical data clustering
17/17
Q&A
Thank you for your patience
July 22, 2013 17
Top Related