8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture...
Transcript of 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture...
![Page 1: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/1.jpg)
8. Unsupervised Analysis8. 8. Unsupervised AnalysisUnsupervised Analysis
![Page 2: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/2.jpg)
Unsupervised AnalysisPattern Recognition:
2
Unsupervised AnalysisUnsupervised AnalysisUnsupervised Analysis
Previously, all our training samples were labeled: these samples were said to be “supervised”
We now investigate a number of “unsupervised” procedures which use unlabeled samples
Collecting and labeling a large set of sample patterns can be costly
We can train with a large amount of (less expensive) unlabeled data, and only then use supervision to label the groupings found, this is appropriate for large “data mining” applications where the contents of a large database are not known beforehand
![Page 3: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/3.jpg)
Unsupervised AnalysisPattern Recognition:
3
Unsupervised AnalysisUnsupervised AnalysisUnsupervised Analysis
This is also appropriate in many applications when the characteristics of the patterns can change slowly with time
Improved performance can be achieved if classifiers running in a unsupervised mode are used
We can use unsupervised methods to identify features that will then be useful for classification
We gain some insight into the nature (or structure) of the data
![Page 4: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/4.jpg)
Unsupervised AnalysisPattern Recognition:
4
Mixture Densities & IdentifiabilityMixture Densities & Mixture Densities & IdentifiabilityIdentifiability• We shall begin with the assumption that the functional
forms for the underlying probability densities are known and that the only thing that must be learned is the value of an unknown parameter vector
• We make the following assumptions:
- The samples come from a known number c of classes
- The prior probabilities P(ωj ) for each class are known (j = 1, …,c)
- P(x | ωj , θj ) (j = 1, …,c) are known
- The values of the c parameter vectors θ1 , θ2 , …, θc are unknown
![Page 5: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/5.jpg)
Unsupervised AnalysisPattern Recognition:
5
• The category labels are unknown
This density function is called a mixture density
Our goal will be to use samples drawn from this mixture density to estimate the unknown parameter vector θ. Once θ is known, we can decompose the mixture into its components and use a MAP classifier on the derived densities
tc21
c
1jparameters mixing
j
densities component
jj
),...,,( where
)(P.),|x(P)|x(P
θθθθ
ωθωθ
=
=∑= 321
4484476
Mixture Densities & IdentifiabilityMixture Densities & Mixture Densities & IdentifiabilityIdentifiability
![Page 6: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/6.jpg)
Unsupervised AnalysisPattern Recognition:
6
• Definition
- A density p(x | θ) is said to be identifiable ifθ ≠ θ’ implies that there exists an x such that:
p(x | θ) ≠ p(x | θ’)
Mixture Densities & IdentifiabilityMixture Densities & Mixture Densities & IdentifiabilityIdentifiability
![Page 7: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/7.jpg)
Unsupervised AnalysisPattern Recognition:
7
As a simple example, consider the case where x is binary and P(x | θ) is the mixture:
Assume that:P(x = 1 | θ) = 0.6 ⇒ P(x = 0 | θ) = 0.4
by replacing these probabilities values, we obtain: θ1 + θ2 = 1.2
⎪⎪⎩
⎪⎪⎨
⎧
=+
=+=
−+−= −−
0x if )(21-1
1x if )(21
)1(21)1(
21)|x(P
21
21
x12
x2
x11
x1
θθ
θθ
θθθθθ
Mixture Densities & IdentifiabilityMixture Densities & Mixture Densities & IdentifiabilityIdentifiability
![Page 8: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/8.jpg)
Unsupervised AnalysisPattern Recognition:
8
• Thus, we have a case in which the mixture distribution is completely unidentifiable, and therefore unsupervised learning is impossible
• In the discrete distributions, if there are too many components in the mixture, there may be more unknowns than independent equations, and identifiability can become a serious problem!
Mixture Densities & IdentifiabilityMixture Densities & Mixture Densities & IdentifiabilityIdentifiability
![Page 9: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/9.jpg)
Unsupervised AnalysisPattern Recognition:
9
• While it can be shown that mixtures of normal densities are usually identifiable, the parameters in the simple mixture density
• Cannot be uniquely identified if P(ω1 ) = P(ω2 )(we cannot recover a unique θ even from an infinite amount of data!)
• θ = (θ1 , θ2 ) and θ = (θ2 , θ1 ) are two possible vectors that can be interchanged without affecting P(x | θ)
• Identifiability can be a problem, we always assume that the densities we are dealing with are identifiable!
⎥⎦⎤
⎢⎣⎡ −−+⎥⎦
⎤⎢⎣⎡ −−= 2
222
11 )x(
21exp
2)(P)x(
21exp
2)(P)|x(P θ
πωθ
πωθ
Mixture Densities & IdentifiabilityMixture Densities & Mixture Densities & IdentifiabilityIdentifiability
![Page 10: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/10.jpg)
Unsupervised AnalysisPattern Recognition:
10
ML EstimatesML EstimatesML EstimatesSuppose that we have a set of n unlabeled samples drawn independently from the mixture density
(θ is fixed but unknown!)
The gradient of the log-likelihood is:
with
![Page 11: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/11.jpg)
Unsupervised AnalysisPattern Recognition:
11
Since the gradient must vanish at the value of θi that maximizes , therefore,
the ML estimate must satisfy the conditions:
ML EstimatesML EstimatesML Estimates
![Page 12: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/12.jpg)
Unsupervised AnalysisPattern Recognition:
12
By including the prior probabilities as unknown variables, we finally obtain:
and
where
ML EstimatesML EstimatesML Estimates
![Page 13: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/13.jpg)
Unsupervised AnalysisPattern Recognition:
13
Applications to Normal MixturesApplications to Normal MixturesApplications to Normal Mixtures
Case 1 = Simplest case
Case μi Σi P(ωi ) c
1 ? x x x
2 ? ? ? x
3 ? ? ? ?
![Page 14: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/14.jpg)
Unsupervised AnalysisPattern Recognition:
14
• Case 1: Unknown mean vectors
Applications to Normal MixturesApplications to Normal MixturesApplications to Normal Mixtures
![Page 15: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/15.jpg)
Unsupervised AnalysisPattern Recognition:
15
• Case 1: Unknown mean vectors (continued)
ML estimate:
is the fraction of those samples having value xk that come from the ith class, and is the average of the samples coming from the ith class.
Applications to Normal MixturesApplications to Normal MixturesApplications to Normal Mixtures
![Page 16: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/16.jpg)
Unsupervised AnalysisPattern Recognition:
16
• Unfortunately, (1) does not give explicitly
• However, if we have some way of obtaining good initial estimates for the unknown means, (1) can be seen as an iterative process for improving the estimates
• This is a gradient ascent for maximizing the log-likelihood function
Applications to Normal MixturesApplications to Normal MixturesApplications to Normal Mixtures
![Page 17: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/17.jpg)
Unsupervised AnalysisPattern Recognition:
17
• Example:Consider the simple two-component one-
dimensional normal mixture
(2 clusters!)
Let’s set μ1 = -2, μ2 = 2 and draw 25 samples sequentially from this mixture.
Applications to Normal MixturesApplications to Normal MixturesApplications to Normal Mixtures
⎥⎦⎤
⎢⎣⎡ μ−−
π+⎥⎦
⎤⎢⎣⎡ μ−−
π=μμ 2
22
121 )x(21exp
232)x(
21exp
231),|x(p
![Page 18: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/18.jpg)
Unsupervised AnalysisPattern Recognition:
18
The log-likelihood function is:
• The maximum value of l occurs at:
(which are not far from the true values: μ1 = -2 and μ2 = +2)
• There is another peak at which has almost the same height as can be seen from the following figure:
Applications to Normal MixturesApplications to Normal MixturesApplications to Normal Mixtures
![Page 19: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/19.jpg)
Unsupervised AnalysisPattern Recognition:
19
![Page 20: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/20.jpg)
Unsupervised AnalysisPattern Recognition:
20
• This mixture of normal densities is identifiable• When the mixture density is not identifiable, the
ML solution is not unique
• Case 2: All parameters unknown
• No constraints are placed on the covariance matrix
Let p(x | μ, σ2) be the two-component normal mixture:
Applications to Normal MixturesApplications to Normal MixturesApplications to Normal Mixtures
![Page 21: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/21.jpg)
Unsupervised AnalysisPattern Recognition:
21
Suppose μ = x1 , therefore:
For the rest of the samples:
Finally,
The likelihood is therefore large and the maximum-likelihood solution becomes singular.
⎥⎦
⎤⎢⎣
⎡−
⎭⎬⎫
⎩⎨⎧
⎥⎦⎤
⎢⎣⎡−+≥ ∑
=
⎟⎠⎞⎜
⎝⎛
→∞→
n
2k
2kn
0 term this
21
2n1 x
21exp
)22(1x
21exp1),|x,...,x(p
πσσμ
σ
444 3444 21
Applications to Normal MixturesApplications to Normal MixturesApplications to Normal Mixtures
![Page 22: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/22.jpg)
Unsupervised AnalysisPattern Recognition:
22
• Adding an assumptionConsider the largest of the finite local maxima of the likelihood function and use the ML estimation.We obtain the following:
∑
∑
∑
∑
∑
=
=
=
=
=
−−=
=
=
n
1kki
n
1k
tikikki
i
n
1kki
n
1kkki
i
n
1kkii
)ˆ,x|(P̂
)ˆx)(ˆx)(ˆ,x|(P̂ˆ
)ˆ,x|(P̂
x)ˆ,x|(P̂ˆ
)ˆ,x|(P̂n1)(P̂
θω
μμθωΣ
θω
θωμ
θωω
Iterative scheme
Applications to Normal MixturesApplications to Normal MixturesApplications to Normal Mixtures
![Page 23: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/23.jpg)
Unsupervised AnalysisPattern Recognition:
23
Where:
∑=
−−
−−
⎥⎦⎤
⎢⎣⎡ −−−
⎥⎦⎤
⎢⎣⎡ −−−
= c
1jjjk
1j
tjk
2/1
j
iik1
it
ik2/1
i
ki
)(P̂)ˆx(ˆ)ˆx(21expˆ
)(P̂)ˆx(ˆ)ˆx(21exp
)ˆ,x|(P̂ωμΣμΣ
ωμΣμΣθω
Applications to Normal MixturesApplications to Normal MixturesApplications to Normal Mixtures
![Page 24: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/24.jpg)
Unsupervised AnalysisPattern Recognition:
24
ClusteringClusteringClustering
What is a Cluster? A cluster is a collection of data points with similar properties.
![Page 25: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/25.jpg)
Unsupervised AnalysisPattern Recognition:
25
ClusteringClusteringClustering
Partitional clustering• C-means algorithm (Hard)• Isodata algorithm• Fuzzy C-means Clustering
Hierarchical clustering• Single-linkage algorithm (minimum method)• Complete-linkage algorithm (maximum method)• Average linkage algorithm• Minimum-variance method (Ward’s method)
![Page 26: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/26.jpg)
Unsupervised AnalysisPattern Recognition:
26
• Goal: find the c mean vectors μ1 , μ2 , …, μc
• Replace the squared Mahalanobis distance in the previous discussion
by the squared Euclidean distance
• Find the mean nearest to xk and approximate
as:
mμ̂
)ˆ,x|(P̂ ki θω⎩⎨⎧ =
≅otherwise 0
mi if 1),x|(P̂ ki θω
C-Means ClusteringCC--Means ClusteringMeans Clustering
![Page 27: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/27.jpg)
Unsupervised AnalysisPattern Recognition:
27
Begin
initialize n, c, μ1 , μ2 , …, μc (randomly selected)
do classify n samples according to nearest μirecompute μi
until no change in μireturn μ1 , μ2 , …, μc
End
• Use the iterative scheme to find • if n is the known number of patterns and c the desired
number of clusters, the C-means algorithm is:
c21 ˆ,...,ˆ,ˆ μμμ
C-Means ClusteringCC--Means ClusteringMeans Clustering
![Page 28: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/28.jpg)
Unsupervised AnalysisPattern Recognition:
28
C-Means ClusteringCC--Means ClusteringMeans Clustering
![Page 29: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/29.jpg)
Unsupervised AnalysisPattern Recognition:
29
C-Means ClusteringCC--Means ClusteringMeans ClusteringC-means algorithm:
1. Begin with C cluster centers2. For each sample, find the cluster center nearest to
it. Put the sample in the cluster represented by the just-found cluster center.
3. If no samples changed clusters, stop.4. Recompute cluster centers of altered clusters and
go back to step 2.Properties:
• The number of cluster C must be given in advance.• The goal is to minimize the square error, but it
could end up in a local minimum.Demo:
http://home.dei.polimi.it/matteucc/Clustering/tutori al_html/AppletKM.html
![Page 30: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/30.jpg)
Unsupervised AnalysisPattern Recognition:
30
Clustering: ISODATAClustering: ISODATAClustering: ISODATA
Similar to C-means with some enhancements:• Clusters with too few elements are discarded.• Clusters are merged if the number of clusters
grows too large or if clusters are too close together.
• A cluster is split if the number of clusters is too few or if the cluster contains very dissimilar samples
Properties:• The number of clusters C is not given exactly in
advance.• The algorithm may requires extensive
computations.
![Page 31: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/31.jpg)
Unsupervised AnalysisPattern Recognition:
31
Fuzzy C-Means ClusteringFuzzy CFuzzy C--Means ClusteringMeans Clustering
![Page 32: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/32.jpg)
Unsupervised AnalysisPattern Recognition:
32
Fuzzy C-Means ClusteringFuzzy CFuzzy C--Means ClusteringMeans Clustering
![Page 33: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/33.jpg)
Unsupervised AnalysisPattern Recognition:
33
Solution Spaces for ClusteringSolution Spaces for ClusteringSolution Spaces for Clustering
Constrained c-Partition Matrices
![Page 34: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/34.jpg)
Unsupervised AnalysisPattern Recognition:
34
Partition of DataPartition of DataPartition of Data
![Page 35: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/35.jpg)
Unsupervised AnalysisPattern Recognition:
35
Defuzzification of Fuzzy PartitionsDefuzzificationDefuzzification of Fuzzy Partitionsof Fuzzy PartitionsConversion via max membership and α-cuts
![Page 36: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/36.jpg)
Unsupervised AnalysisPattern Recognition:
36
Fuzzy c-Means ClusteringFuzzy cFuzzy c--Means ClusteringMeans Clustering
![Page 37: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/37.jpg)
Unsupervised AnalysisPattern Recognition:
37
Fuzzy c-Means (FCM) ClusteringFuzzy cFuzzy c--Means (FCM) ClusteringMeans (FCM) ClusteringLet’s now assume that a sample can belong to all the clusters. Define uij where
The cost function (or objective function) for FCM can then be written as
where is between 0 and 1; is the cluster center of fuzzy group is the Euclidean distance between i-th cluster center and jth data point; and is a weighting exponent.
11, 1, , .
c
iji
u j n=
= ∀ =∑ K
( ) 21
1 1
, , , ,c c n
mc i ij ij
i i j
J U J u d= =
= =∑ ∑∑Kc c
iju [ )1,m∈ ∞
![Page 38: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/38.jpg)
Unsupervised AnalysisPattern Recognition:
38
Fuzzy c-Means (FCM) ClusteringFuzzy cFuzzy c--Means (FCM) ClusteringMeans (FCM) Clustering
The updated mean vector is
where
iju [ )1,m∈ ∞
1
1
,n m
ij jji n m
ijj
u
u=
=
=∑∑
xc
( )2/ 1
1
1 .ij mc ijk
kj
udd
−
=
=⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠
∑
![Page 39: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/39.jpg)
Unsupervised AnalysisPattern Recognition:
39
C-means Algorithm (FCM/HCM)CC--means Algorithm (FCM/HCM)means Algorithm (FCM/HCM)
![Page 40: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/40.jpg)
Unsupervised AnalysisPattern Recognition:
40
Hierarchical ClusteringHierarchical ClusteringHierarchical Clustering
Agglomerative clustering (bottom up)
1. Begin with n clusters; each containing one sample
2. Merge the most similar two clusters into one.
3. Repeat the previous step until done
Divisive clustering (top down)
1
1
7
7
1,7
1,7
1,7
5 6
5,6
5,6
3
3
3
3,5,6
3,5,6
1,3,5,6,7
2
2
2
2
4
4
4
4
2,4
2,4
1,2,3,4,5,6,7
![Page 41: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/41.jpg)
Unsupervised AnalysisPattern Recognition:
41
Hierarchical ClusteringHierarchical ClusteringHierarchical ClusteringThe most natural representation of hierarchical clustering is a corresponding tree, called a dendrogram, which shows how the samples are grouped
![Page 42: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/42.jpg)
Unsupervised AnalysisPattern Recognition:
42
Hierarchical ClusteringHierarchical ClusteringHierarchical Clustering
Demo: http://home.dei.polimi.it/matteucc/Clustering/tu torial_html/AppletH.html
![Page 43: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/43.jpg)
Unsupervised AnalysisPattern Recognition:
43
Hierarchical ClusteringHierarchical ClusteringHierarchical Clustering
Single-linkage algorithm (minimum method)
Complete-linkage algorithm (maximum method)
Average-linkage algorithm (average method)
D (C ,C ) d(a,b)i j a C ,b Ci jmin min=
∈ ∈
D (C ,C ) d(a,b)i j a C ,b Ci jmax max=
∈ ∈
D (C ,C )C C
d(a,b)ave i ji j a C ,b Ci j
=∈ ∈∑1
![Page 44: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/44.jpg)
Unsupervised AnalysisPattern Recognition:
44
Hierarchical ClusteringHierarchical ClusteringHierarchical Clustering
Ward’s method (minimum-variance method)
( )D (C ,C ) xWard i j jj
d
i j ji
m
j
d
= = −= ==∑ ∑∑σ μ2
1 11
2
d: the number of featuresm: the number of samples in Ci and Cj
![Page 45: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/45.jpg)
Unsupervised AnalysisPattern Recognition:
45
Hierarchical ClusteringHierarchical ClusteringHierarchical Clustering
Single-linkage algorithm
![Page 46: 8. 8. Unsupervised AnalysisUnsupervised Analysis ... · This density function is called a mixture density. Our goal will be to use samples drawn from this mixture density to estimate](https://reader033.fdocuments.us/reader033/viewer/2022042203/5ea4bf76eecf646b7338c527/html5/thumbnails/46.jpg)
Unsupervised AnalysisPattern Recognition:
46
Hierarchical ClusteringHierarchical ClusteringHierarchical Clustering
Complete-linkage algorithm