Post on 13-Apr-2017
Images/cinvestav-1.jpg
Outline
1 Fuzzy ClusteringHistoryFuzzy C-Means ClusteringUsing the Lagrange MultipliersThe Final Algorithm!!!Pros and Cons of FCM
2 What can we do? Possibilistic ClusteringIntroductionCost FunctionExplanation
2 / 39
Images/cinvestav-1.jpg
Outline
1 Fuzzy ClusteringHistoryFuzzy C-Means ClusteringUsing the Lagrange MultipliersThe Final Algorithm!!!Pros and Cons of FCM
2 What can we do? Possibilistic ClusteringIntroductionCost FunctionExplanation
3 / 39
Images/cinvestav-1.jpg
Some of the Fuzzy Clustering Models
Fuzzy Clustering ModelBezdek, 1981
Possibilistic Clustering ModelKrishnapuram - Keller, 1993
Fuzzy Possibilistic Clustering ModelN. Pal - K. Pal - Bezdek, 1997
4 / 39
Images/cinvestav-1.jpg
Some of the Fuzzy Clustering Models
Fuzzy Clustering ModelBezdek, 1981
Possibilistic Clustering ModelKrishnapuram - Keller, 1993
Fuzzy Possibilistic Clustering ModelN. Pal - K. Pal - Bezdek, 1997
4 / 39
Images/cinvestav-1.jpg
Some of the Fuzzy Clustering Models
Fuzzy Clustering ModelBezdek, 1981
Possibilistic Clustering ModelKrishnapuram - Keller, 1993
Fuzzy Possibilistic Clustering ModelN. Pal - K. Pal - Bezdek, 1997
4 / 39
Images/cinvestav-1.jpg
Outline
1 Fuzzy ClusteringHistoryFuzzy C-Means ClusteringUsing the Lagrange MultipliersThe Final Algorithm!!!Pros and Cons of FCM
2 What can we do? Possibilistic ClusteringIntroductionCost FunctionExplanation
5 / 39
Images/cinvestav-1.jpg
Fuzzy C-Means Clustering
The input an unlabeled data setX = {x1,x2,x3, ...,xN}.xk ∈ Rp
OutputA partition S of the X as a matrix U of C ×N .Set of cluster centers V = {v1, v2, ..., vC} ⊂ Rp
6 / 39
Images/cinvestav-1.jpg
Fuzzy C-Means Clustering
The input an unlabeled data setX = {x1,x2,x3, ...,xN}.xk ∈ Rp
OutputA partition S of the X as a matrix U of C ×N .Set of cluster centers V = {v1, v2, ..., vC} ⊂ Rp
6 / 39
Images/cinvestav-1.jpg
What we want
Creation of the Cost FunctionFirst:
We can use a distance defined as:
‖xk − vi‖ =√
(xk − vi)T (xk − vi) (1)
The euclidean distance from a point k to a centroid i.NOTE other distances based in Mahalonobis can be taken inconsideration.
7 / 39
Images/cinvestav-1.jpg
What we want
Creation of the Cost FunctionFirst:
We can use a distance defined as:
‖xk − vi‖ =√
(xk − vi)T (xk − vi) (1)
The euclidean distance from a point k to a centroid i.NOTE other distances based in Mahalonobis can be taken inconsideration.
7 / 39
Images/cinvestav-1.jpg
What we want
Creation of the Cost FunctionFirst:
We can use a distance defined as:
‖xk − vi‖ =√
(xk − vi)T (xk − vi) (1)
The euclidean distance from a point k to a centroid i.NOTE other distances based in Mahalonobis can be taken inconsideration.
7 / 39
Images/cinvestav-1.jpg
Do you remember the cost function for K -means?
Finding a partition S that minimizes the following function
minS
N∑k=1
∑k:xk∈Ci
‖xk − vi‖2 (2)
Where vi = 1Ni
∑xk∈Ci
xk
We can rewrite the previous equation as
minS
N∑k=1
C∑i=1
I (xk ∈ Ci) ‖xk − vi‖2 (3)
8 / 39
Images/cinvestav-1.jpg
Do you remember the cost function for K -means?
Finding a partition S that minimizes the following function
minS
N∑k=1
∑k:xk∈Ci
‖xk − vi‖2 (2)
Where vi = 1Ni
∑xk∈Ci
xk
We can rewrite the previous equation as
minS
N∑k=1
C∑i=1
I (xk ∈ Ci) ‖xk − vi‖2 (3)
8 / 39
Images/cinvestav-1.jpg
In addition
Did you notice that the membership is always one or zero?
minS
N∑k=1
C∑i=1
Membership︷ ︸︸ ︷I (xk ∈ Ci) ‖xk − vi‖2 (4)
9 / 39
Images/cinvestav-1.jpg
Thus, we can rethink the membership using something“Fuzzy”
What if we modify the cost function to something like this
minS
N∑k=1
C∑i=1
Membership︷ ︸︸ ︷Fuzzy Value ‖xk − vi‖2 (5)
This means that we think that each cluster Ci is “Fuzzy”We can assume a fuzzy set for the cluster Ci with memebership function:
Ai : Rp → [0, 1] (6)
Such that we can tune it by using a power i.e. decreasing it by a m power.
10 / 39
Images/cinvestav-1.jpg
Thus, we can rethink the membership using something“Fuzzy”
What if we modify the cost function to something like this
minS
N∑k=1
C∑i=1
Membership︷ ︸︸ ︷Fuzzy Value ‖xk − vi‖2 (5)
This means that we think that each cluster Ci is “Fuzzy”We can assume a fuzzy set for the cluster Ci with memebership function:
Ai : Rp → [0, 1] (6)
Such that we can tune it by using a power i.e. decreasing it by a m power.
10 / 39
Images/cinvestav-1.jpg
Thus, we can rethink the membership using something“Fuzzy”
What if we modify the cost function to something like this
minS
N∑k=1
C∑i=1
Membership︷ ︸︸ ︷Fuzzy Value ‖xk − vi‖2 (5)
This means that we think that each cluster Ci is “Fuzzy”We can assume a fuzzy set for the cluster Ci with memebership function:
Ai : Rp → [0, 1] (6)
Such that we can tune it by using a power i.e. decreasing it by a m power.
10 / 39
Images/cinvestav-1.jpg
Thus, we can rethink the membership using something“Fuzzy”
What if we modify the cost function to something like this
minS
N∑k=1
C∑i=1
Membership︷ ︸︸ ︷Fuzzy Value ‖xk − vi‖2 (5)
This means that we think that each cluster Ci is “Fuzzy”We can assume a fuzzy set for the cluster Ci with memebership function:
Ai : Rp → [0, 1] (6)
Such that we can tune it by using a power i.e. decreasing it by a m power.
10 / 39
Images/cinvestav-1.jpg
Under the following constraints
First
Ai (xk) ∈ [0, 1] ∀i, k (7)
Second
0 <N∑
k=1Ai (xk) < N ∀i (8)
ThirdC∑
i=1Ai (xk) = 1 ∀k (9)
11 / 39
Images/cinvestav-1.jpg
Under the following constraints
First
Ai (xk) ∈ [0, 1] ∀i, k (7)
Second
0 <N∑
k=1Ai (xk) < N ∀i (8)
ThirdC∑
i=1Ai (xk) = 1 ∀k (9)
11 / 39
Images/cinvestav-1.jpg
Under the following constraints
First
Ai (xk) ∈ [0, 1] ∀i, k (7)
Second
0 <N∑
k=1Ai (xk) < N ∀i (8)
ThirdC∑
i=1Ai (xk) = 1 ∀k (9)
11 / 39
Images/cinvestav-1.jpg
Final Cost Function
Properties
Jm (S) =N∑
k=1
C∑i=1
[Ai (xk)]m ‖xk − vi‖2 (10)
Under the constraintsAi (xk) ∈ [0, 1], for 1 ≤ k ≤ N and 1 ≤ i ≤ C .∑C
i=1 Ai (xk) = 1, for 1 ≤ k ≤ N .0 <
∑Nk=1 Ai (xk) < n, for 1 ≤ i ≤ C .
m > 1.
12 / 39
Images/cinvestav-1.jpg
Final Cost Function
Properties
Jm (S) =N∑
k=1
C∑i=1
[Ai (xk)]m ‖xk − vi‖2 (10)
Under the constraintsAi (xk) ∈ [0, 1], for 1 ≤ k ≤ N and 1 ≤ i ≤ C .∑C
i=1 Ai (xk) = 1, for 1 ≤ k ≤ N .0 <
∑Nk=1 Ai (xk) < n, for 1 ≤ i ≤ C .
m > 1.
12 / 39
Images/cinvestav-1.jpg
Final Cost Function
Properties
Jm (S) =N∑
k=1
C∑i=1
[Ai (xk)]m ‖xk − vi‖2 (10)
Under the constraintsAi (xk) ∈ [0, 1], for 1 ≤ k ≤ N and 1 ≤ i ≤ C .∑C
i=1 Ai (xk) = 1, for 1 ≤ k ≤ N .0 <
∑Nk=1 Ai (xk) < n, for 1 ≤ i ≤ C .
m > 1.
12 / 39
Images/cinvestav-1.jpg
Final Cost Function
Properties
Jm (S) =N∑
k=1
C∑i=1
[Ai (xk)]m ‖xk − vi‖2 (10)
Under the constraintsAi (xk) ∈ [0, 1], for 1 ≤ k ≤ N and 1 ≤ i ≤ C .∑C
i=1 Ai (xk) = 1, for 1 ≤ k ≤ N .0 <
∑Nk=1 Ai (xk) < n, for 1 ≤ i ≤ C .
m > 1.
12 / 39
Images/cinvestav-1.jpg
Final Cost Function
Properties
Jm (S) =N∑
k=1
C∑i=1
[Ai (xk)]m ‖xk − vi‖2 (10)
Under the constraintsAi (xk) ∈ [0, 1], for 1 ≤ k ≤ N and 1 ≤ i ≤ C .∑C
i=1 Ai (xk) = 1, for 1 ≤ k ≤ N .0 <
∑Nk=1 Ai (xk) < n, for 1 ≤ i ≤ C .
m > 1.
12 / 39
Images/cinvestav-1.jpg
Outline
1 Fuzzy ClusteringHistoryFuzzy C-Means ClusteringUsing the Lagrange MultipliersThe Final Algorithm!!!Pros and Cons of FCM
2 What can we do? Possibilistic ClusteringIntroductionCost FunctionExplanation
13 / 39
Images/cinvestav-1.jpg
Using the Lagrange Multipliers
New cost function
J̄m (S) =N∑
k=1
C∑i=1
[Ai (xk)]m ‖xk − vi‖2 −N∑
k=1λk
[ C∑i=1
Ai (xk)− 1]
(11)
Derive with respect to Ai (xk)∂J̄m (S)∂Ai (xk) = mAi (xk)m−1 ‖xk − vi‖2 − λk = 0 (12)
Thus
Ai (xk) =[
λk
m ‖xk − vi‖2
] 1m−1
(13)
14 / 39
Images/cinvestav-1.jpg
Using the Lagrange Multipliers
New cost function
J̄m (S) =N∑
k=1
C∑i=1
[Ai (xk)]m ‖xk − vi‖2 −N∑
k=1λk
[ C∑i=1
Ai (xk)− 1]
(11)
Derive with respect to Ai (xk)∂J̄m (S)∂Ai (xk) = mAi (xk)m−1 ‖xk − vi‖2 − λk = 0 (12)
Thus
Ai (xk) =[
λk
m ‖xk − vi‖2
] 1m−1
(13)
14 / 39
Images/cinvestav-1.jpg
Using the Lagrange Multipliers
New cost function
J̄m (S) =N∑
k=1
C∑i=1
[Ai (xk)]m ‖xk − vi‖2 −N∑
k=1λk
[ C∑i=1
Ai (xk)− 1]
(11)
Derive with respect to Ai (xk)∂J̄m (S)∂Ai (xk) = mAi (xk)m−1 ‖xk − vi‖2 − λk = 0 (12)
Thus
Ai (xk) =[
λk
m ‖xk − vi‖2
] 1m−1
(13)
14 / 39
Images/cinvestav-1.jpg
Using the Lagrange MultipliersSum over all i’s
C∑i=1
Ai (xk) = λ1
m−1k
m1
m−1 ‖xk − vi‖2
m−1(14)
Thus
λk = m[∑Ci=1
1‖xk−vi‖
2m−1
]m−1 (15)
Plug Back on equation 12 using j instead of im[∑C
j=11
‖xk−vj‖2
m−1
]m−1 = mAi (xk)m−1 ‖xk − vi‖2 (16)
15 / 39
Images/cinvestav-1.jpg
Using the Lagrange MultipliersSum over all i’s
C∑i=1
Ai (xk) = λ1
m−1k
m1
m−1 ‖xk − vi‖2
m−1(14)
Thus
λk = m[∑Ci=1
1‖xk−vi‖
2m−1
]m−1 (15)
Plug Back on equation 12 using j instead of im[∑C
j=11
‖xk−vj‖2
m−1
]m−1 = mAi (xk)m−1 ‖xk − vi‖2 (16)
15 / 39
Images/cinvestav-1.jpg
Using the Lagrange MultipliersSum over all i’s
C∑i=1
Ai (xk) = λ1
m−1k
m1
m−1 ‖xk − vi‖2
m−1(14)
Thus
λk = m[∑Ci=1
1‖xk−vi‖
2m−1
]m−1 (15)
Plug Back on equation 12 using j instead of im[∑C
j=11
‖xk−vj‖2
m−1
]m−1 = mAi (xk)m−1 ‖xk − vi‖2 (16)
15 / 39
Images/cinvestav-1.jpg
Finally
We have that
Ai (xk) = 1[∑Cj=1
{‖xk−vi‖2
‖xk−vj‖2
} 1m−1
] (17)
In a similar way we have
vi =∑N
k=1 Ai (xk)m xk∑Nk=1 Ai (xk)m (18)
16 / 39
Images/cinvestav-1.jpg
Finally
We have that
Ai (xk) = 1[∑Cj=1
{‖xk−vi‖2
‖xk−vj‖2
} 1m−1
] (17)
In a similar way we have
vi =∑N
k=1 Ai (xk)m xk∑Nk=1 Ai (xk)m (18)
16 / 39
Images/cinvestav-1.jpg
Outline
1 Fuzzy ClusteringHistoryFuzzy C-Means ClusteringUsing the Lagrange MultipliersThe Final Algorithm!!!Pros and Cons of FCM
2 What can we do? Possibilistic ClusteringIntroductionCost FunctionExplanation
17 / 39
Images/cinvestav-1.jpg
Final AlgorithmFuzzy c-means
1 Let t = 0. Select an initial fuzzy pseudo-partition.
2 Calculate the initial C cluster centers using, v(t)i =
∑Nk=1 A(t)
i (xk)mxk∑Nk=1 A(t)
i (xk)m .
3 Update for each xk the membership function by
I Case I:∥∥∥xk − v(t)
i
∥∥∥2> 0 for all i ∈ {1, 2, ...,C} then
A(t+1)i (xk) = 1[∑C
j=1
{‖xk −v(t)
i ‖2
‖xk −v(t)j ‖
2
} 1m−1]
I Case II:∥∥∥xk − v(t)
i
∥∥∥2= 0 for some i ∈ I ⊆ {1, 2, ...,C} then define
A(t+1)i (xk) by any nonnegative number such that
∑i∈I Ai (xk) = 1
and A(t+1)i (xk) = 0 for i /∈ I .
4 If∣∣∣S(t+1) − S(t)
∣∣∣ = maxi,k
∣∣∣A(t+1)i (xk)−A(t)
i (xk)∣∣∣ ≤ ε stop; otherwise
increase t and go to step 2.18 / 39
Images/cinvestav-1.jpg
Final AlgorithmFuzzy c-means
1 Let t = 0. Select an initial fuzzy pseudo-partition.
2 Calculate the initial C cluster centers using, v(t)i =
∑Nk=1 A(t)
i (xk)mxk∑Nk=1 A(t)
i (xk)m .
3 Update for each xk the membership function by
I Case I:∥∥∥xk − v(t)
i
∥∥∥2> 0 for all i ∈ {1, 2, ...,C} then
A(t+1)i (xk) = 1[∑C
j=1
{‖xk −v(t)
i ‖2
‖xk −v(t)j ‖
2
} 1m−1]
I Case II:∥∥∥xk − v(t)
i
∥∥∥2= 0 for some i ∈ I ⊆ {1, 2, ...,C} then define
A(t+1)i (xk) by any nonnegative number such that
∑i∈I Ai (xk) = 1
and A(t+1)i (xk) = 0 for i /∈ I .
4 If∣∣∣S(t+1) − S(t)
∣∣∣ = maxi,k
∣∣∣A(t+1)i (xk)−A(t)
i (xk)∣∣∣ ≤ ε stop; otherwise
increase t and go to step 2.18 / 39
Images/cinvestav-1.jpg
Final AlgorithmFuzzy c-means
1 Let t = 0. Select an initial fuzzy pseudo-partition.
2 Calculate the initial C cluster centers using, v(t)i =
∑Nk=1 A(t)
i (xk)mxk∑Nk=1 A(t)
i (xk)m .
3 Update for each xk the membership function by
I Case I:∥∥∥xk − v(t)
i
∥∥∥2> 0 for all i ∈ {1, 2, ...,C} then
A(t+1)i (xk) = 1[∑C
j=1
{‖xk −v(t)
i ‖2
‖xk −v(t)j ‖
2
} 1m−1]
I Case II:∥∥∥xk − v(t)
i
∥∥∥2= 0 for some i ∈ I ⊆ {1, 2, ...,C} then define
A(t+1)i (xk) by any nonnegative number such that
∑i∈I Ai (xk) = 1
and A(t+1)i (xk) = 0 for i /∈ I .
4 If∣∣∣S(t+1) − S(t)
∣∣∣ = maxi,k
∣∣∣A(t+1)i (xk)−A(t)
i (xk)∣∣∣ ≤ ε stop; otherwise
increase t and go to step 2.18 / 39
Images/cinvestav-1.jpg
Final AlgorithmFuzzy c-means
1 Let t = 0. Select an initial fuzzy pseudo-partition.
2 Calculate the initial C cluster centers using, v(t)i =
∑Nk=1 A(t)
i (xk)mxk∑Nk=1 A(t)
i (xk)m .
3 Update for each xk the membership function by
I Case I:∥∥∥xk − v(t)
i
∥∥∥2> 0 for all i ∈ {1, 2, ...,C} then
A(t+1)i (xk) = 1[∑C
j=1
{‖xk −v(t)
i ‖2
‖xk −v(t)j ‖
2
} 1m−1]
I Case II:∥∥∥xk − v(t)
i
∥∥∥2= 0 for some i ∈ I ⊆ {1, 2, ...,C} then define
A(t+1)i (xk) by any nonnegative number such that
∑i∈I Ai (xk) = 1
and A(t+1)i (xk) = 0 for i /∈ I .
4 If∣∣∣S(t+1) − S(t)
∣∣∣ = maxi,k
∣∣∣A(t+1)i (xk)−A(t)
i (xk)∣∣∣ ≤ ε stop; otherwise
increase t and go to step 2.18 / 39
Images/cinvestav-1.jpg
Final AlgorithmFuzzy c-means
1 Let t = 0. Select an initial fuzzy pseudo-partition.
2 Calculate the initial C cluster centers using, v(t)i =
∑Nk=1 A(t)
i (xk)mxk∑Nk=1 A(t)
i (xk)m .
3 Update for each xk the membership function by
I Case I:∥∥∥xk − v(t)
i
∥∥∥2> 0 for all i ∈ {1, 2, ...,C} then
A(t+1)i (xk) = 1[∑C
j=1
{‖xk −v(t)
i ‖2
‖xk −v(t)j ‖
2
} 1m−1]
I Case II:∥∥∥xk − v(t)
i
∥∥∥2= 0 for some i ∈ I ⊆ {1, 2, ...,C} then define
A(t+1)i (xk) by any nonnegative number such that
∑i∈I Ai (xk) = 1
and A(t+1)i (xk) = 0 for i /∈ I .
4 If∣∣∣S(t+1) − S(t)
∣∣∣ = maxi,k
∣∣∣A(t+1)i (xk)−A(t)
i (xk)∣∣∣ ≤ ε stop; otherwise
increase t and go to step 2.18 / 39
Images/cinvestav-1.jpg
Final AlgorithmFuzzy c-means
1 Let t = 0. Select an initial fuzzy pseudo-partition.
2 Calculate the initial C cluster centers using, v(t)i =
∑Nk=1 A(t)
i (xk)mxk∑Nk=1 A(t)
i (xk)m .
3 Update for each xk the membership function by
I Case I:∥∥∥xk − v(t)
i
∥∥∥2> 0 for all i ∈ {1, 2, ...,C} then
A(t+1)i (xk) = 1[∑C
j=1
{‖xk −v(t)
i ‖2
‖xk −v(t)j ‖
2
} 1m−1]
I Case II:∥∥∥xk − v(t)
i
∥∥∥2= 0 for some i ∈ I ⊆ {1, 2, ...,C} then define
A(t+1)i (xk) by any nonnegative number such that
∑i∈I Ai (xk) = 1
and A(t+1)i (xk) = 0 for i /∈ I .
4 If∣∣∣S(t+1) − S(t)
∣∣∣ = maxi,k
∣∣∣A(t+1)i (xk)−A(t)
i (xk)∣∣∣ ≤ ε stop; otherwise
increase t and go to step 2.18 / 39
Images/cinvestav-1.jpg
Final Output
The Matrix UThe elements of U are Uik = Ai (xk).
The centroidsV = {v1, v2, ..., vC}
19 / 39
Images/cinvestav-1.jpg
Final Output
The Matrix UThe elements of U are Uik = Ai (xk).
The centroidsV = {v1, v2, ..., vC}
19 / 39
Images/cinvestav-1.jpg
Outline
1 Fuzzy ClusteringHistoryFuzzy C-Means ClusteringUsing the Lagrange MultipliersThe Final Algorithm!!!Pros and Cons of FCM
2 What can we do? Possibilistic ClusteringIntroductionCost FunctionExplanation
20 / 39
Images/cinvestav-1.jpg
Pros and Cons of Fuzzy C-Means
AdvantagesUnsupervisedAlways converges
DisadvantagesLong computational timeSensitivity to the initial guess (speed, local minima)Sensitivity to noise
I One expects low (or even no) membership degree for outliers (noisypoints)
21 / 39
Images/cinvestav-1.jpg
Pros and Cons of Fuzzy C-Means
AdvantagesUnsupervisedAlways converges
DisadvantagesLong computational timeSensitivity to the initial guess (speed, local minima)Sensitivity to noise
I One expects low (or even no) membership degree for outliers (noisypoints)
21 / 39
Images/cinvestav-1.jpg
Pros and Cons of Fuzzy C-Means
AdvantagesUnsupervisedAlways converges
DisadvantagesLong computational timeSensitivity to the initial guess (speed, local minima)Sensitivity to noise
I One expects low (or even no) membership degree for outliers (noisypoints)
21 / 39
Images/cinvestav-1.jpg
Pros and Cons of Fuzzy C-Means
AdvantagesUnsupervisedAlways converges
DisadvantagesLong computational timeSensitivity to the initial guess (speed, local minima)Sensitivity to noise
I One expects low (or even no) membership degree for outliers (noisypoints)
21 / 39
Images/cinvestav-1.jpg
Pros and Cons of Fuzzy C-Means
AdvantagesUnsupervisedAlways converges
DisadvantagesLong computational timeSensitivity to the initial guess (speed, local minima)Sensitivity to noise
I One expects low (or even no) membership degree for outliers (noisypoints)
21 / 39
Images/cinvestav-1.jpg
Pros and Cons of Fuzzy C-Means
AdvantagesUnsupervisedAlways converges
DisadvantagesLong computational timeSensitivity to the initial guess (speed, local minima)Sensitivity to noise
I One expects low (or even no) membership degree for outliers (noisypoints)
21 / 39
Images/cinvestav-1.jpg
Outline
1 Fuzzy ClusteringHistoryFuzzy C-Means ClusteringUsing the Lagrange MultipliersThe Final Algorithm!!!Pros and Cons of FCM
2 What can we do? Possibilistic ClusteringIntroductionCost FunctionExplanation
24 / 39
Images/cinvestav-1.jpg
Krinshapuram and Keller
Following ZadehThey took in consideration that each class prototype as defining an elasticconstraint.
What?Giving the ti (xk) as degree of compatibility of sample xk with cluster Ci .
We do the followingIf we consider the Ci as fuzzy sets over the set of samplesX = {x1,x2, ...,xN}
25 / 39
Images/cinvestav-1.jpg
Krinshapuram and Keller
Following ZadehThey took in consideration that each class prototype as defining an elasticconstraint.
What?Giving the ti (xk) as degree of compatibility of sample xk with cluster Ci .
We do the followingIf we consider the Ci as fuzzy sets over the set of samplesX = {x1,x2, ...,xN}
25 / 39
Images/cinvestav-1.jpg
Krinshapuram and Keller
Following ZadehThey took in consideration that each class prototype as defining an elasticconstraint.
What?Giving the ti (xk) as degree of compatibility of sample xk with cluster Ci .
We do the followingIf we consider the Ci as fuzzy sets over the set of samplesX = {x1,x2, ...,xN}
25 / 39
Images/cinvestav-1.jpg
Here is the Catch!!!
We should not use the old membershipC∑
i=1Ai (xk) = 1 (19)
BecauseThis is quite probabilistic... which is not what we want!!!
ThusWe only ask for membership, now using the possibilistic notation of ti (xk)(This is known as typicality value), to be in the interval [0, 1].
26 / 39
Images/cinvestav-1.jpg
Here is the Catch!!!
We should not use the old membershipC∑
i=1Ai (xk) = 1 (19)
BecauseThis is quite probabilistic... which is not what we want!!!
ThusWe only ask for membership, now using the possibilistic notation of ti (xk)(This is known as typicality value), to be in the interval [0, 1].
26 / 39
Images/cinvestav-1.jpg
Here is the Catch!!!
We should not use the old membershipC∑
i=1Ai (xk) = 1 (19)
BecauseThis is quite probabilistic... which is not what we want!!!
ThusWe only ask for membership, now using the possibilistic notation of ti (xk)(This is known as typicality value), to be in the interval [0, 1].
26 / 39
Images/cinvestav-1.jpg
New Constraints
First
ti (xk) ∈ [0, 1] ∀i, k (20)
Second
0 <N∑
k=1ti (xk) < N ∀i (21)
Third
maxi
ti (xk) > 0 ∀k (22)
27 / 39
Images/cinvestav-1.jpg
New Constraints
First
ti (xk) ∈ [0, 1] ∀i, k (20)
Second
0 <N∑
k=1ti (xk) < N ∀i (21)
Third
maxi
ti (xk) > 0 ∀k (22)
27 / 39
Images/cinvestav-1.jpg
New Constraints
First
ti (xk) ∈ [0, 1] ∀i, k (20)
Second
0 <N∑
k=1ti (xk) < N ∀i (21)
Third
maxi
ti (xk) > 0 ∀k (22)
27 / 39
Images/cinvestav-1.jpg
Outline
1 Fuzzy ClusteringHistoryFuzzy C-Means ClusteringUsing the Lagrange MultipliersThe Final Algorithm!!!Pros and Cons of FCM
2 What can we do? Possibilistic ClusteringIntroductionCost FunctionExplanation
28 / 39
Images/cinvestav-1.jpg
We have the following cost function
Cost FunctionN∑
k=1
C∑i=1
[ti (xk)]m ‖xk − vi‖2 (23)
ProblemUnconstrained optimization of first term will lead to the trivial solutionti (xk) = 0 for all i, k.
Thus, we can introduce the following constraint
ti (xk)→ 1 (24)
Roughly it means to make the typicality values as large as possible.
29 / 39
Images/cinvestav-1.jpg
We have the following cost function
Cost FunctionN∑
k=1
C∑i=1
[ti (xk)]m ‖xk − vi‖2 (23)
ProblemUnconstrained optimization of first term will lead to the trivial solutionti (xk) = 0 for all i, k.
Thus, we can introduce the following constraint
ti (xk)→ 1 (24)
Roughly it means to make the typicality values as large as possible.
29 / 39
Images/cinvestav-1.jpg
We have the following cost function
Cost FunctionN∑
k=1
C∑i=1
[ti (xk)]m ‖xk − vi‖2 (23)
ProblemUnconstrained optimization of first term will lead to the trivial solutionti (xk) = 0 for all i, k.
Thus, we can introduce the following constraint
ti (xk)→ 1 (24)
Roughly it means to make the typicality values as large as possible.
29 / 39
Images/cinvestav-1.jpg
We can try to control this tendency
By putting all them together inN∑
k=1(1− ti (xk))m (25)
With m to control the tendency of ti (xk)→ 1
We can also run this tendency over all the cluster using a suitablewi > 0 per cluster
C∑i=1
wi
N∑k=1
(1− ti (xk))m (26)
30 / 39
Images/cinvestav-1.jpg
We can try to control this tendency
By putting all them together inN∑
k=1(1− ti (xk))m (25)
With m to control the tendency of ti (xk)→ 1
We can also run this tendency over all the cluster using a suitablewi > 0 per cluster
C∑i=1
wi
N∑k=1
(1− ti (xk))m (26)
30 / 39
Images/cinvestav-1.jpg
Possibilistic C-Mean Clustering (PCM)
The final Cost Function
Jm (S) =N∑
k=1
C∑i=1
[ti (xk)]m ‖xk − vi‖2 +C∑
i=1wi
N∑k=1
(1− ti (xk))m (27)
Whereti (xk) are typicality values.wi are cluster weights
31 / 39
Images/cinvestav-1.jpg
Possibilistic C-Mean Clustering (PCM)
The final Cost Function
Jm (S) =N∑
k=1
C∑i=1
[ti (xk)]m ‖xk − vi‖2 +C∑
i=1wi
N∑k=1
(1− ti (xk))m (27)
Whereti (xk) are typicality values.wi are cluster weights
31 / 39
Images/cinvestav-1.jpg
Outline
1 Fuzzy ClusteringHistoryFuzzy C-Means ClusteringUsing the Lagrange MultipliersThe Final Algorithm!!!Pros and Cons of FCM
2 What can we do? Possibilistic ClusteringIntroductionCost FunctionExplanation
32 / 39
Images/cinvestav-1.jpg
Explanation
First TermN∑
k=1
C∑i=1
[ti (xk)]m ‖xk − vi‖2 (28)
It demands that the distance from feature vector to prototypes be as smallas possible!!!
Second Termc∑
i=1wi
n∑k=1
(1− ti (xk))m (29)
It forces the typicality values ti (xk) to be as large as possible.
33 / 39
Images/cinvestav-1.jpg
Explanation
First TermN∑
k=1
C∑i=1
[ti (xk)]m ‖xk − vi‖2 (28)
It demands that the distance from feature vector to prototypes be as smallas possible!!!
Second Termc∑
i=1wi
n∑k=1
(1− ti (xk))m (29)
It forces the typicality values ti (xk) to be as large as possible.
33 / 39
Images/cinvestav-1.jpg
Explanation
First TermN∑
k=1
C∑i=1
[ti (xk)]m ‖xk − vi‖2 (28)
It demands that the distance from feature vector to prototypes be as smallas possible!!!
Second Termc∑
i=1wi
n∑k=1
(1− ti (xk))m (29)
It forces the typicality values ti (xk) to be as large as possible.
33 / 39
Images/cinvestav-1.jpg
Explanation
First TermN∑
k=1
C∑i=1
[ti (xk)]m ‖xk − vi‖2 (28)
It demands that the distance from feature vector to prototypes be as smallas possible!!!
Second Termc∑
i=1wi
n∑k=1
(1− ti (xk))m (29)
It forces the typicality values ti (xk) to be as large as possible.
33 / 39
Images/cinvestav-1.jpg
Final Updating Equations
Typicality Values
ti (xk) = 1
1 +(‖xk−vi‖2
wi
) 1m−1
, ∀i, k (30)
Cluster Centers
vi =∑N
k=1 ti (xk)m xk∑nk=1 ti (xk)m (31)
34 / 39
Images/cinvestav-1.jpg
Final Updating Equations
Typicality Values
ti (xk) = 1
1 +(‖xk−vi‖2
wi
) 1m−1
, ∀i, k (30)
Cluster Centers
vi =∑N
k=1 ti (xk)m xk∑nk=1 ti (xk)m (31)
34 / 39
Images/cinvestav-1.jpg
Final Updating Equations
Weights
wi = M∑N
k=1 [ti (xk)]m ‖xk − vi‖2∑nk=1 [ti (xk)]m , (32)
with M > 0.
35 / 39
Images/cinvestav-1.jpg
Possibilistic can deal with outliersAfter running without outliers
0
1
36 / 39
Images/cinvestav-1.jpg
Pros and Cons of Fuzzy C-Means
AdvantagesClustering noisy data samples.
DisadvantagesVery sensitive to good initialization.
In Between!!!Coincident clusters may result.
Because the columns and rows of the typicality matrix areindependent of each other.This could be advantageous (start with a large value of C and getless distinct clusters)
38 / 39
Images/cinvestav-1.jpg
Pros and Cons of Fuzzy C-Means
AdvantagesClustering noisy data samples.
DisadvantagesVery sensitive to good initialization.
In Between!!!Coincident clusters may result.
Because the columns and rows of the typicality matrix areindependent of each other.This could be advantageous (start with a large value of C and getless distinct clusters)
38 / 39
Images/cinvestav-1.jpg
Pros and Cons of Fuzzy C-Means
AdvantagesClustering noisy data samples.
DisadvantagesVery sensitive to good initialization.
In Between!!!Coincident clusters may result.
Because the columns and rows of the typicality matrix areindependent of each other.This could be advantageous (start with a large value of C and getless distinct clusters)
38 / 39
Images/cinvestav-1.jpg
Pros and Cons of Fuzzy C-Means
AdvantagesClustering noisy data samples.
DisadvantagesVery sensitive to good initialization.
In Between!!!Coincident clusters may result.
Because the columns and rows of the typicality matrix areindependent of each other.This could be advantageous (start with a large value of C and getless distinct clusters)
38 / 39
Images/cinvestav-1.jpg
Pros and Cons of Fuzzy C-Means
AdvantagesClustering noisy data samples.
DisadvantagesVery sensitive to good initialization.
In Between!!!Coincident clusters may result.
Because the columns and rows of the typicality matrix areindependent of each other.This could be advantageous (start with a large value of C and getless distinct clusters)
38 / 39
Images/cinvestav-1.jpg
Nevertheless
There are more advanced clustering methods based on thepossibilistic and fuzzy ideaPal, N.R.; Pal, K.; Keller, J.M.; Bezdek, J.C., "A Possibilistic Fuzzyc-Means Clustering Algorithm," Fuzzy Systems, IEEE Transactions on ,vol.13, no.4, pp.517,530, Aug. 2005.
39 / 39