Determining the number of clusters using information entropy for mixed data

21
Determining the number of clusters using information entropy for mixed data Presenter : Hong-Yi, Cai Authors : Jiye Liang, Xingwang Zhao, Deyu Li, Fuyuan Cao, Chuangyin Dang PR, 2012 1

description

Presenter : Hong-Yi, Cai Authors : Jiye Liang, Xingwang Zhao, Deyu Li, Fuyuan Cao, Chuangyin Dang PR, 2012. Determining the number of clusters using information entropy for mixed data. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

Transcript of Determining the number of clusters using information entropy for mixed data

Determining the number of clusters using information entropy for mixed data

Presenter : Hong-Yi, Cai Authors : Jiye Liang, Xingwang Zhao, Deyu Li, Fuyuan Cao, Chuangyin Dang

PR, 2012

1

Outlines

• Motivation• Objectives• Methodology• Experiments• Conclusions• Comments

2

Motivation

• The determination of the initial parameters of cluster is the most difficult problem.

• None of cluster algorithms can cluster effectively mixed data set.

3

Objectives

• To propose a generalized mechanism on mixed data set by integrating Renyi entropy and complement entropy.

• To improve k-prototype algorithm by using new generalized mechanism.

4

Methodology

• K-Prototype…

5

Methodology

• A generalized mechanism for numerical data…

6

Renyi Entropy :

Parzen window density estimation:

By the convolution theorem…

Within-Cluster Entropy:

Between-Cluster Entropy:

Improved Entropy for numerical data:

Methodology

• A generalized mechanism for categorical data…

7

Indiscernibility relation…

Complement Entropy: Within-Cluster Entropy:

Improved Entropy for categorical data:

Between-Cluster Entropy:

Huang Dissimilarity for categorical data:

Methodology

• A generalized mechanism for mixed data set…

8

Methodology

• Cluster validity index for mixed data…

9

For numerical data…

For categorical data…

For mixed data…

10

Methodology

Experiments

• Ten Cluster

11

Experiments

• STUDENT

12

Experiments

• Real data sets…

13

Experiments

• Wine Breast

14

Experiments

• Voting Car

15

Experiments

• DNA TAE

16

Experiments

• Heart Credit

17

Experiments

• CMC Adult

18

Experiments

19

Conclusions

• The generalized mechanism and algorithm can cluster effectively and determine the optimal number of clusters for mixed data sets.

20

Comments

• Advantages–The entropy can apply on mixed data set.

• Applications–Cluster for mixed-type data

21