Gaussian Process Clustering - TU Berlin · 2015. 1. 13. · Project in Artificial Intelligence and...
Transcript of Gaussian Process Clustering - TU Berlin · 2015. 1. 13. · Project in Artificial Intelligence and...
Technical University of Berlin
Gaussian Process Clustering Project in Artificial Intelligence and Machine Learning WS 2013/14
Fangzhou Yang (352040)
Jing Cao (352030)
2014-4-17
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
1
Content
1. Introduction ............................................................................................................................. 2
2. Background -- Gaussian Process (GP) ...................................................................................... 2
2.1 Gaussian Process .............................................................................................................. 2
2.2 Gaussian Process for Regression ...................................................................................... 3
2.3 Support Estimate from Gaussian Process ............................................................................. 3
3. Gaussian Process for Clustering .............................................................................................. 4
3.1 Clustering based on the variance function ...................................................................... 4
3.2 Dynamical System for cluster characterization................................................................ 5
3.3 GP Clustering Algorithm ................................................................................................... 6
4. Implementation of GPC Package ............................................................................................. 7
4.1 Gaussian Process Clustering Algorithm ................................................................................. 7
4.2 Measures for clustering Performance................................................................................... 8
4.3 Virtualization ......................................................................................................................... 9
5. Test & Evaluation ..................................................................................................................... 9
5.1 Dataset ........................................................................................................................... 10
5.2 GP Clustering Algorithm Testing .................................................................................... 10
5.2.1 Test with R15 Dataset ................................................................................................... 10
5.2.2 Test with Spiral Dataset ................................................................................................ 14
5.2.3 Test with Iris Dataset .................................................................................................... 14
5.3 Evaluation ....................................................................................................................... 15
6. A Clustering Application for location-based Data ................................................................. 17
7. Conclusion ............................................................................................................................. 19
8. Acknowledgements ............................................................................................................... 20
9. References ............................................................................................................................. 21
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
2
1. Introduction
In 1994, when the field of neural networks was becoming mature and shows a big complexity in
decisions making, Researchers found out that with an infinite size, some of the neural networks
turns to Gaussian process which made Gaussian Process a good candidate to help with simplifying
practical machine learning problems.[1] Since then Gaussian process has been widely used and
shows great advantages in supervised learning problems like regression, classification. This leads
to a question if Gaussian process can be also efficient in solving unsupervised problems like
clustering.
This report summarizes our project focusing on this problem. The task of the project is to develop
an understanding of the idea for clustering with Gaussian process models, according to the work
of Hyun-Chul Kim and Jaewook Lee [2]. Based on that, we implement a Gaussian Process
Clustering Python Package, and perform some clustering tests with different datasets.
This report is organized as follows: Section 2 will describe the definition and main properties of
Gaussian Process; The clustering algorithm based on Gaussian process proposed by Hyun-Chul
Kim and Jaewook Lee [2] will be introduced in Section 3; Section 4 focuses on our implementation
of the Gaussian Process Clustering Package (GPC) based on this algorithm; In section 5 we will
test and evaluate the clustering performance of Gaussian Process Clustering algorithm with
different data sets; Section 6 presents a demo application of Gaussian Process Clustering
Algorithm; Finally, the project is concluded in Section 7 and the acknowledgement as well as the
references are given by the last two sections.
2. Background -- Gaussian Process (GP)
In this section, some basic knowledge of Gaussian process is introduced to ease the discussion of
the clustering algorithm based on Gaussian process.
2.1 Gaussian Process A Gaussian process is defined as follows:
Definition: A Gaussian Process is a collection of random variables, any finite number of which have (consistent) joint Gaussian distributions.[3]
Some properties of Gaussian process make it widely applied and easier to be analyzed. First of all, Gaussian process can approximately describe many natural phenomenon. This makes it a basic model for abstracting practical problems. Second, it has many good algebra properties like
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
3
a linear combination of Gaussian distribution still follows Gaussian distribution, or a Gaussian process can be fully specified by its mean function and covariance function[3]. These properties make it easier for calculation and analysis.
2.2 Gaussian Process for Regression Since a Gaussian process is fully specified by its mean function and covariance function, the Gaussian process regression will be reviewed first to get these functions.[4] Suppose that the training data points xi ∈ ℜ𝑑 with continuous target values ti form a data set D. Then the regression problem is to find the predictive distribution of the target value �̃� for a
new data point �̃� .The target function 𝑓 represents the mapping relationship of �̃� = 𝑓(�̃�) .
Gaussian process regression will assume 𝑓 has a Gaussian process prior. Then the density of any collection of target function values is modeled as a multivariate Gaussian density. The covariance matrix C is made up by terms satisfying 𝐶𝑖𝑗 = 𝐶(𝑥𝑖, 𝑥𝑗; Θ) which is a
parameterized function with the hyperparameters Θ .
When we know 𝐶𝑁 , then 𝐶𝑁+1 = (𝐶𝑁 𝑘
𝑘𝑇 𝑐) (2.2.1),
where k = [C(�̃�, 𝑥1; Θ), C(�̃�, 𝑥2; Θ), … , C(�̃�, 𝑥𝑛; Θ)]𝑇 , c = C(�̃�, �̃�; Θ). Then the variance is 𝜎2(𝑥) = 𝑐 − 𝑘𝑇𝐶−1𝑘. (2.2.2) The covariance function
C(𝑥𝑖, 𝑥𝑗; Θ) = 𝑣0exp {−1
2∑ 𝑙𝑚(𝑥𝑖
𝑚 − 𝑥𝑗𝑚)2} + 𝑣1 + 𝛿𝑖𝑗𝑣2
𝑑𝑚=1 (2.2.3)
is the kernel function. The hyperparameters Θ = {𝑙𝑚|𝑚 = 1, … , 𝑑} ∪ {𝑣0, 𝑣1, 𝑣2} can be learned from the data.
2.3 Support Estimate from Gaussian Process From Section 2.2, we got the variance function of Gaussian process. Take a 2-dimensional data
set to be observed as Figure 1, it can be easily seen that the variances of predictive values have
smaller values in a denser area and larger values in a sparse area.[5] Which means the value of
the variances can indicate the density of the input data points. Since it is common to apply the
support of a probability density function to clustering problems, we can take the variances as a
good estimate of the support of a probability function.
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
4
Figure 1 Examples of Gaussian process regression. Variance of the predictive value is related to the density
3. Gaussian Process for Clustering
In this section we will introduce the clustering algorithm base on the observation of the variance
function, and relevant mathematical analysis and explanations for the key definitions [2] will be
given out.
3.1 Clustering based on the variance function Let the cutting level 𝑟∗ represent the contours of the clusters, set it first as
𝑟∗ = 𝑚𝑎𝑥𝑘𝜎2(𝑥𝑘) (3.1.1)
Then a cluster is given by {x: 𝜎2(𝑥) < 𝑟∗} and the data set can be decomposed into several
disjoint connected sets
L(𝑟∗) ≔ {x: 𝜎2(𝑥) ≤ 𝑟∗} = 𝐶1 ∪ … ∪ 𝐶𝑃 (3.1.2)
To apply Gaussian process in clustering problem is to develop an algorithm separate the input
data points into these disjoint connected sets.
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
5
3.2 Dynamical System for cluster characterization The algorithm needs some lemmas to support. So in this section we will describe how to assign a
data point into the corresponding cluster.
A dynamical system is built as follows:
𝑑𝑥
𝑑𝑡= F(x) ≔ −∇𝜎2(𝑥) (3.2.1)
In the system, for each initial state x(0) = 𝑥0 , there is a unique time evolution solution
(trajectory). On the trajectory, the state vector �̅� ∈ ℜ𝑛 satisfying F(�̅�) = 0 is called an
equilibrium point of the dynamical system and if the eigenvalues of the state vector’s derivative
are positive, it is called the (asymptotically) stable equilibrium point (SEP).
Figure 2 The partitioning property of the proposed clustering algorithm. (The dashed lines represent the basin boundary separating each clusters, and thearrows represent the direction of the system trajectories)
As what Figure 2 shows, the vector field F(x) in equation (3.2.1) is orthogonal to the contour
{y: 𝜎2(y) = 𝑟∗} and points inward the surface, which makes every trajectory remain in one of
the clusters. This property leads to two lemmas which are lied on by the clustering algorithm.
Lemma 1 For any given level value r > 0, each connected component of the level set L(r) ={x: 𝜎2(𝑥) ≤ r} is positively invariant, that is, if a point is on a connected component of L(r), then its entire positive trajectory lies on the same component.
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
6
Lemma 2 The trajectory of process 4.2 approaches one of the equilibrium points of the equation. In particular, almost all the trajectory approaches one of stable equilibrium points of equation (3.2.1). Lemma 1 points out that if any point on a trajectory is found to be within one cluster, then all the other points on the same trajectory are also within the same cluster. Lemma 2 tells that several trajectories will finally approach a same stable equilibrium point which can be found through equation (3.2.1) . These two lemmas give the basic idea of GP clustering algorithm. The problem how to assign all the data points to corresponding clusters can be simplified into assigning the relevant stable equilibrium points to the clusters. Now the problem is how to assign the stable equilibrium points into the clusters. If the SEPs are located in two different clusters, then a straight line connecting them must have a segment point y on it that y satisfies 𝜎2(𝑦) > 𝑟∗ , if there is no such points on the line, then the two SEPs should be within the same cluster. Based on this idea, an adjacent matrix A is built to help with the algorithm.
𝐴𝑖𝑗 = {1 , 𝑖𝑓𝜎2(𝑦) ≤ 𝑟∗𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑝𝑜𝑖𝑛𝑡 𝑦 𝑜𝑛 𝑡ℎ𝑒 𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑖 , 𝑠𝑗
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (3.2.2)
Where 𝑠𝑖 , 𝑠𝑗 represent the i-th SEP and j-th SEP.
3.3 GP Clustering Algorithm In a conclusion, the GP Clustering algorithm can be stated as the following four steps:
Step1: For given unlabeled training data X = {𝑥𝑖|i = 1,2, … , N}, construct a variance function 𝜎2(∙), in equation 3.1, and compute the level value 𝑟∗ = 𝑚𝑎𝑥𝑥𝑖∈𝑋𝜎2(𝑥𝑖)
Step 2: Using each data point as an initial value, we apply the process in equation 4.2 to find its corresponding stable equilibrium point, denoted by 𝑠𝑖, 𝑖 = 1, … , 𝑝 . Let 𝑋𝑖 be the set of training data points that converge to a stable equilibrium point 𝑠𝑖 for each i = 1, … , N
Step 3: For each pair of stable equilibrium point 𝑠𝑖, 𝑠𝑗 , 𝑖, 𝑗 = 1, … , 𝑝, i,,define an adjacency
matrix A with its elements:
𝐴𝑖𝑗 = {1 , 𝑖𝑓𝜎2(𝑦) ≤ 𝑟∗𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑝𝑜𝑖𝑛𝑡 𝑦 𝑜𝑛 𝑡ℎ𝑒 𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑖 , 𝑠𝑗
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (3.2.2)
and assign the same cluster index to the stable equilibrium point in the same connected components of the graph induced by A. Step 4: For each i = 1, … , p, assign the same cluster label to all the training data points in 𝑋𝑖 .
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
7
4. Implementation of GPC Package
In this section, we will describe our implementation of Gaussian Process Clustering Package,
which contains the basic four steps of Gaussian Process Clustering Algorithm, three measures for
measuring the clustering performance, and some virtualization functions including a method for
PCA [6] high dimensional virtualization [7]. This package is implemented in Python, and has
package dependencies on SciPy [8], NumPy [9] and matplotlib [10].
The following three subsections will explain some important interfaces and methods of our
package, more specific implementation details and usages can be found in our code.
4.1 Gaussian Process Clustering Algorithm The Gaussian Process Clustering Algorithm is implemented by the four basic steps as follows:
a. Construct the covariance Function and Covariance Matrix:
The package uses a commonly used covariance function:
C(xi, 𝑥𝑗) = 𝑣0 exp{−1
2∑ 𝑙𝑚(𝑥𝑖
𝑚 − 𝑥𝑗𝑚)
2𝑑𝑚=1 } + 𝑣1 + 𝜎𝑖𝑗𝑣2 (2.2.3)
The hyper-parameters such as v0, 𝑣1, 𝑣2, 𝑙𝑚, can be set in the package to construct a proper
variance function.
After the variance function is constructed, we can calculate the covariance matrix with the
method: C = getCovarianceMatrix(data) .
b. Compute Stable Equilibrium Points
The function sep= getEquilibriumPoint(x,data,inv,ita,maxIteration) is implemented to
compute the SEP (Stable Equilibrium Point) for each data point in the dataset. We use
gradient descent method to generate trajectory for dynamic system, thus the convergence
point of gradient descent (gradient descent, 2014) is the stable equilibrium point that the
generated trajectory approaches. Here, sep is the stable equilibrium point of the data point
x, inv is the inverse of covariance matrix C, ita is the step size of gradient descent.
Because there are small errors for the computation of each SEP, the coordinate of SEPs that
are supposed to be one SEP, cannot be exactly the same. The function reduceSEPs(seps,
min_accepted_covariance) is designed to combine the SEPs, which are supposed to be one
SEP, to one SEP, it will return a reduced SEP list and the index map for mapping the data
points to reduced SEP list.
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
8
To check if two SEPs are the same one or not, we use the constructed covariance function
C(xi, 𝑥𝑗) , the closer the two SEPs are, the larger the value of their covariance is. The
parameter min_accepted_covariance is set as a criterion to combine SEPs. Therefore, if the
covariance value of two points are larger than the min_accepted_covariance, then we
determine that they are one SEP.
c. Construct Adjacency Matrix
The Function getAdjacencyMatrix(sepList, maxVar, pointsNumPerDistanceUnit, data, invA) is
implemented to compute the adjacency matrix A. We pick some checkpoints between two
arbitrary SEPs i and j in the SEP list, if the variance of all the checkpoints are smaller than
cutting level maxVar, then we determine that, these SEPs i and j are in the same cluster and
thus set in adjacency matrix A[i][j] and A[j][i] to 1. Because we will check all the pairs of SEPs,
a large number of checkpoints for each pair could be a computation bottleneck. However, if
the number of checkpoints are too small, for some pairs in which two data points are far away,
the checkpoints are loose and thus might omit some points whose variances are larger than
the cutting level.
In our implementation, we came up an idea to compromise this problem. The number of
checkpoints between two data points is no more fixed to a value, but variable and dependent
on the geometric distance of these two data points. The shorter the distance is, the less
checkpoints we need to make the decision. In this case, we can save lots of computation for
unnecessary checkpoints.
d. Assign Cluster Lables
The function getSEPsClusters(adjacencyMatrix, sepList) is to get the clusters of SEPs according
to the computed adjacency Matrix. Because all the data points which approach to the same
SEP belong to a same cluster (see the Lemmas in Section 3.2) , we can easily assign cluster
labels for each data point, which is implemented by the function
getPointClusters(sepsClusters, sepIndexMap).
4.2 Measures for clustering Performance To measure the clustering performance, we implement three measures such as the reference
error rate RE, cluster error rate CE, and F Score [2].
Here we define:
nr,c – the number of data points that belong to reference cluster r and are assigned to
cluster c by a clustering algorithm ( nr,c can be calculated via the function
getCountNrc(references, clusters, referencesNum, clustersNum) )
n – the number of all the data points
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
9
nr – the number of the data points in the reference cluster r
nc – the number of the data points in the cluster c obtained by a clustering algorithm
a. Reference Error Rate: RE = 1 −∑ max
𝑐𝑛𝑟,𝑐r
𝑛 (4.2.1)
The RE can be calculated via the function getRE(references, clusters, referencesNum,
clustersNum, Nrc)
b. Cluster Error Rate: CE = 1 −∑ max
𝑟𝑛𝑟,𝑐c
𝑛 (4.2.2)
The CE can be calculated via the function getCE(references, clusters, referencesNum,
clustersNum, Nrc)
c. F Score:
The F Score of the reference cluster r and cluster c is defined as: Fr,c =2𝑅𝑟,𝑐𝑃𝑟,𝑐
𝑅𝑟,𝑐+𝑃𝑟,𝑐 where Rr,c =
𝑛𝑟,𝑐
𝑛𝑐 represents Recall and Pr,c =
𝑛𝑟,𝑐
𝑛𝑟 represents Precision.
The F Score of the reference cluster r is the maximum F Score value over all the cluster as:
Fr = maxc
𝐹𝑟,𝑐
The overall F Score is defined as:
F Score = ∑𝑛𝑟
𝑛 𝐹𝑟r (4.2.3)
In general, the higher the F Score (maximum to 1) is, the better the clustering result becomes.
To get the F Score of a clustering result, we can use the function in our package
getFScore(references, clusters, referencesNum, clustersNum, Nrc)
4.3 Virtualization To virtualize the clustering result, we implement some functions to plot data points and their
cluster labels in a two-dimensional coordinate such as plotScatter(X,Y,title,c) and
plotCluster(X,Y,clusters,title). For higher dimensional dataset, we implement a PCA [6] function
pca(data,nRedDim), so that we can project our high dimensional data into the most two
important principle components, and plot data points and their cluster labels in the projected
coordinate [7].
5. Test & Evaluation
In this section, we will test Gaussian Process Clustering Algorithm by using our implemented
package. Besides, we will evaluate the clustering performance and compare it with a reference
clustering algorithm K-Means Clustering [11].
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
10
5.1 Dataset The datasets that we are using for algorithm testing are from Clustering dataset Joensuu of
University of Eastern Finland [12], which contains two two-dimensional datasets i.e. R15 and
Spiral Shape Sets (see figure 3) and one four-dimensional dataset i.e. Iris dataset. All these
datasets have referencing cluster labels so that we can evaluate our clustering results with
measures such as Reference Error Rate, Cluster Error Rate and F Score.
Figure 3 R15 and Spiral Shape Sets
5.2 GP Clustering Algorithm Testing
5.2.1 Test with R15 Dataset
In order to get the clustering result of Gaussian Process Clustering Algorithm, we will once more
follow the four basic steps as we introduced in the previous sections.
a. Construct the variance Function and Covariance Matrix:
To construct the default covariance function in our implemented package, we need to determine
the value of hyper-parameters {v0, 𝑣1, 𝑣2, 𝑙𝑚} in the covariance function. Usually we set v0 and
v2 to 1 and v1 to 0, and the parameters lm {𝑚 ∈ {1, … , 𝑑} can be regarded as the weight for each
dimension in the covariance function. For the dataset R15, the data points in both two
dimensions have the similar variance, thus we set a same value α for both l1 and l2. However,
different value of α has different effects on the variance for each data point. Figure 4 shows the
heat map of variance with different α value.
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
11
Figure 4 Heat Map of Variance with different 𝛼
As we can see, the value of α determines the distribution of the Gaussian process variance of
each data point. The higher the value of α, the sharper and rougher the distribution of the
variance, in contrast, the lower the value of α, the smoother and the vaguer the distribution of
the variance. What we want is to find a proper distribution, neither too sharp nor too smooth, so
that we can easily determine a cutting level to get a good clustering result. Therefore in this case,
we will set α = 1 and set the cutting level to some value from 0.5 to 0.6.
After the covariance function is determined, we can easily compute the covariance matrix by
using the method getCovarianceMatrix in the package.
b. Compute Stable Equilibrium Points
The Stable Equilibrium Point for each data point can be calculated and combined by the
functions in the package. The following figure shows the scatter plot of reduced SEPs:
Figure 5 Reduced SEPs Scatter Plot
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
12
As we can see, we have reduced the original 600 data points to 98 SEPs. According to the
previous lemmas, each data point will approach to one SEP, and all the data points that belong
to one SEP are in the same cluster.
c. Construct Adjacency Matrix
To construct the adjacency matrix, we need to determine the cutting level maxVar, different
cutting levels will lead to different clustering results, which we will present in the end of this
subsection. Here we will just set it to a value of 0.52.
After the adjacency matrix is constructed, we can get the clusters of SEPs. The following figure
is the scatter plot of clustered SEPs.
Figure 6 Scatter Pkot of SEPs
d. Assign Cluster Lables
Here we just simply assign the cluster label of each SEP to the corresponding data points. The
clustering result is shown in the following figure. As we can see, each cluster is marked by a
different color.
Figure 7 Clustering Result (Scatter Plot –GP Clustering mVar =0.52)
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
13
As what we mentioned, different cutting levels will lead to different clustering results, more
precisely, it will lead to different number of clusters.
Figure 8 Clustering Result (With different cutting levels)
Figure 9 Relationship between the number of clusters and the cutting level
Figure 8 shows the different clustering results with three different cutting levels. The data points
in the middle have a totally different cluster assignment. Figure 9 shows that, the higher the
cutting level, the more clusters in the clustering result.
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
14
5.2.2 Test with Spiral Dataset
The Spiral dataset is quite similar to R15 dataset, but with a different data shape. The data in the
same cluster is linked close one by one and stretch as a spiral, instead of staying together in a
small group.
The testing approach is quite similar to the R15 dataset, thus we won’t go into too many details
again. The following figure shows the clustering result for Spiral dataset. As we can see, the data
is perfectly clustered by Gaussian Process Clustering.
Figure 10 Scatter Plot—GP Clustering, mVar=0.99
5.2.3 Test with Iris Dataset
Iris Dataset has four dimensions, in order to virtualize the distribution of the dataset, we use
principle component analysis and project all data points to the most important two principle
components.
Figure 11 PCA—1st and 2nd components
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
15
The figure above shows the distribution of data points by projecting data into first and second
principle components. The three colors in the figure are the three referencing cluster labels.
To get a clustering result by Gaussian Process Clustering, we perform the same approaches as
mentioned above. One difference is that, we have to set different parameters for different
dimensions in the covariance function. It will also be more difficult to find a proper hyper-
parameter, because more dimensions means more parameters to determine.
Figure 12 GP Clustering PCA – 1st and 2nd components
Figure 12 shows the clustering result for Iris dataset. Comparing to the referencing clusters shown
in figure 11, the data points in the red reference cluster are well clustered. Although there are
still some error clustered points for the green and blue referencing clusters, the overall clustering
result for Iris Dataset is good.
5.3 Evaluation To evaluate the clustering results, we use the three measures, which are implemented in our
package, i.e. reference error rate, clustering error rate and F Score.
Table 1 shows the evaluation of the clustering result for Iris Dataset
Table 1 Evaluation result for Iris Dataset
CE RE F Score
GP Clustering 0.027 0.233 0.853
As we can see, although the reference error rate is larger than 0.2, the cluster error rate is quite
small. The F Score shows a good overall clustering performance, which means the Gaussian
Process Clustering algorithm can work well for the Iris dataset.
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
16
To have a better performance perspective of Gaussian Process Clustering Algorithm, we
implement a simple two-dimensional K-Means Clustering Algorithm as a referencing clustering
algorithm. And then we do the same clustering test for R15 and Spiral dataset but with K-Means
Clustering. Then we use the three measures to evaluate the clustering result for both Gaussian
Process Clustering and K-Means Clustering, the results are presented in the following figures.
Figure 13 R15 Dataset
Figure 14 Spiral Dataset
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
17
As we can see in the Figure 13, both Gaussian Process Clustering and K-Means Clustering have a
very good clustering performance on R15 dataset. By comparing the F Scores, K-Means Clustering
has even a slightly better clustering result. But it doesn’t mean K-Means has a much better
performance than Gaussian Process Clustering. The good result of K-Means is based on the
precise prio-knowledge of the number of clusters (K = 15), however in the real case, we usually
don’t know how many clusters we should have. In contract, Gaussian process clustering can
provide a much flexible strategy to change the number of clusters by modifying the cutting level.
In the Figure 14, we see a perfect clustering result for Gaussian Process Clustering on Spiral
dataset. However, the clustering result for K-Means is bad, although the precise prio-knowledge
of K is given. The result shows us another advantage of Gaussian Process Clustering, that Gaussian
Process Clustering has a good clustering performance, to detect clusters with arbitrarily complex
shapes.
6. A Clustering Application for location-based Data
In this section, we will shortly present a clustering application of Gaussian Process Clustering for
location-based data.
The basic idea of this application is to cluster schools based on their locations. As we know,
schools as a kind of basic infrastructure facilities in a city are distributed closely depending on the
distribution of population, more specifically are the distribution of city blocks or regions. These
blocks and regions may have different complex shapes based on their terrains. What we want is
to discover such blocks or districts according to the clusters of schools.
The dataset is the Location of ACT (Australian Capital Territory) Schools [13], which contains 132
schools in Canberra. The distribution of the schools in the city can be seen in Figure 15. In the
dataset, each school has a post code besides its location. Considering the post code is coded
according to their administrative blocks or regions, thus we expect that, schools of one cluster
haves same or similar post code.
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
18
Figure 15 Distribution of ACT Schools
Figure 16 shows the clustering results with different cutting levels. Here different cutting levels
lead to different clustering results, it provides us a flexible strategy to determine the number of
clusters and the average size of each cluster. In this case, different sizes of clusters have different
practical meanings. The Clusters in the left sub figure (maxVar = 0.1) are close to city blocks, on
the contrary, the clusters in the middle sub figure (maxVar = 0.2) are more like super blocks. In
contrast, the clusters in the right sub figure (maxVar = 0.3) can be regarded as city regions.
Besides, as we can see in these sub figures, Gaussian Process Clustering can detect complex
cluster shapes. The schools which stay close to each other will be assigned into a cluster, no
matter the terrain are narrow-and-long or wide-and-short or even in a circle around a lake.
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
19
Figure 16 Clustering result for the ACT schools with different cutting levels
7. Conclusion
In this project, we develop an understanding of the idea for clustering with Gaussian process
models, according to the work of Hyun-Chul Kim and Jaewook Lee [2]. Based on that, we
implement a Gaussian Process Clustering Python Package, and perform some clustering tests
with different datasets.
In Gaussian Process Clustering, the variance function is applied to construct a set of contours that
enclose the data points, which correspond to cluster boundaries. A dynamic process associated
with the variance function is built and applied to cluster labeling of data points. The results of our
clustering tests show that, Gaussian Process Clustering Algorithm has the ability to detect clusters
with arbitrarily complex shapes and also high-dimensional clusters. Besides, Gaussian Process
Clustering provides a flexible strategy to change the number of clusters and the average cluster
size by modifying the cutting level, which enable us to cluster the data set with overlapped
clusters and control the number of clusters.
In our clustering tests, we had to set a proper hyper-parameter so that we can get the good
clustering results. As we showed in the previous tests, the covariance function controls the
distribution of Gaussian process variance, while the cutting level controls the number of clusters
and average cluster size. It is important to estimate a proper hyper-parameter. However, it is not
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
20
an easy job to find the optimal hyper-parameter, and the complexity for determining hyper-
parameter increases as the dimension of the dataset grows. Therefore, to develop a robust
strategy to set a proper hyper-parameter is one of the future work of Gaussian Process Clustering
algorithm. Besides, Gaussian Process Clustering Algorithm is time costly, some strategies to
speed up the algorithm could be developed in the future so that the algorithm can not only has
a good clustering performance but also be fast and efficient.
8. Acknowledgements
This project is one of projects in Machine Learning and Artificial Intelligence WS 2013/14, which
are supported by Lab KI [14] and Professor Dr. Manfred Opper. Special thanks to our supervisors
Florian Stimberg and Andreas Ruttor for the supports and helps in this project.
Project in Artificial Intelligence and Machine Learning Clustering Algorithm based on Gaussian Process
21
9. References
[1] C. E. Rasmussen, C. K. I. Williams, G. Processes, M. I. T. Press, and M. I. Jordan, Gaussian Processes for Machine Learning. 2006.
[2] H.-C. Kim and J. Lee, “Clustering based on gaussian processes.,” Neural Comput., vol. 19, no. 11, pp. 3088–107, Nov. 2007.
[3] C. E. Rasmussen, “Gaussian Processes in Machine Learning.”
[4] C. E. Rasmussen, “EVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NON-LINEAR REGRESSION Evaluation of Gaussian Processes and other Methods for Non-Linear Regression Abstract,” 1996.
[5] H.-C. Kim and J. Lee, “Pseudo-density Estimation for Clustering with Gaussian Processes,” in Advances in Neural Networks - ISNN 2006 SE - 183, vol. 3971, J. Wang, Z. Yi, J. Zurada, B.-L. Lu, and H. Yin, Eds. Springer Berlin Heidelberg, 2006, pp. 1238–1243.
[6] M. Ringnér, “What is principal component analysis?,” Nat. Biotechnol., vol. 26, no. 3, pp. 303–4, Mar. 2008.
[7] G. Grinstein, M. Trutschl, and U. Cvek, “High-Dimensional Visualizations Visualization Taxonomies Matrix of Scatterplots.”
[8] “SciPy.org — SciPy.org.” [Online]. Available: http://www.scipy.org/. [Accessed: 17-Apr-2014].
[9] “NumPy — Numpy.” [Online]. Available: http://www.numpy.org/. [Accessed: 17-Apr-2014].
[10] “matplotlib: python plotting — Matplotlib 1.3.1 documentation.” [Online]. Available: http://matplotlib.org/. [Accessed: 17-Apr-2014].
[11] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, “An efficient k-means clustering algorithm: analysis and implementation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 881–892, Jul. 2002.
[12] “Clustering datasets.” [Online]. Available: http://cs.joensuu.fi/sipu/datasets/. [Accessed: 12-Apr-2014].
[13] “ACT School Locations,” ACT Government. [Online]. Available: https://www.data.act.gov.au/Education/ACT-School-Locations/q8rt-q8cy. [Accessed: 13-Apr-2014].
[14] “Methoden der Künstlichen Intelligenz.” [Online]. Available: http://www.ki.tu-berlin.de/menue/methoden_der_kuenstlichen_intelligenz/. [Accessed: 13-Apr-2014].