Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

11
Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

description

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 3 Unsupervised Learning Data Mining –Understand internal/hidden structure of data distribution Labeling (Target value, teaching input) Cost is High –Large amount of feature vectors –Sampling may involve costly experiments –Data label may not be available at all Pre-processing for classification –features within the same cluster are similar, and –often belong to the same class

Transcript of Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

Page 1: Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

Intro. ANN & Fuzzy Systems

Lecture 20Clustering (1)

Page 2: Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

(C) 2001 by Yu Hen Hu 2

Intro. ANN & Fuzzy Systems

Outline

• Unsupervised learning (Competitive Learning) and Clustering

• K-Means Clustering Algorithm

Page 3: Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

(C) 2001 by Yu Hen Hu 3

Intro. ANN & Fuzzy Systems

Unsupervised Learning

• Data Mining– Understand internal/hidden structure of data

distribution • Labeling (Target value, teaching input) Cost is High

– Large amount of feature vectors – Sampling may involve costly experiments – Data label may not be available at all

• Pre-processing for classification – features within the same cluster are similar, and – often belong to the same class

Page 4: Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

(C) 2001 by Yu Hen Hu 4

Intro. ANN & Fuzzy Systems

Competitive Learning

• A form of unsupervised learning.

• Neurons compete against each other with their activation values. The winner(s) reserve the privilege to update their weights. The losers may even be punished by updating their weights in opposite direction.

• Competitive and Cooperative Learning:

Competitive: Only one neuron's activation can be reinforced.

Cooperative: Several neurons' activation can be reinforced.

Page 5: Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

(C) 2001 by Yu Hen Hu 5

Intro. ANN & Fuzzy Systems

Competitive Learning Rule

• A neuron WINS the competition if its output is largest among all neurons for the same input x(n).

• The weights of the winning neuron (k-th) is adjusted:

wk(n) [x(n) – wk(n)]

The positions of losing neurons remain unchanged.• If the weights of a neuron represents its POSITION. If

the output of a neuron is inversely proportional to the distance between x(n) and wk(n), then

Competitive Learning = CLUSTERING!

Page 6: Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

(C) 2001 by Yu Hen Hu 6

Intro. ANN & Fuzzy Systems

Competitive Learning Example

learncl1.m

-2 -1 0 1 2-2

-1

0

1

2initial

-2 -1 0 1 2-2

-1

0

1

2after 25 iterations

-2 -1 0 1 2-2

-1

0

1

2after 75 iterations

-2 -1 0 1 2-2

-1

0

1

2at end of 100 iterations

Page 7: Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

(C) 2001 by Yu Hen Hu 7

Intro. ANN & Fuzzy Systems

What is “Clustering”?

What can we learn from these “unlabeled” data samples?– Structures: Some samples

are closer to each other than other samples

– The closeness between samples are determined using a “similarity measure”

– The number of samples per unit volume is related to the concept of “density” or “distribution”

0 20 40 60 80 1000

0.5

1

-0.5 0 0.5 1 1.50

0.5

1

1.5

Page 8: Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

(C) 2001 by Yu Hen Hu 8

Intro. ANN & Fuzzy Systems

Clustering Problem Statement • Given a set of vectors {xk; 1 k K}, find a set of

M clustering centers {w(i); 1 i c} such that each xk is assigned to a cluster, say, w(i*), according to a distance (distortion, similarity) measure d(xk, w(i)) such that the average distortion

is minimized. • I(xk,i) = 1 if x is assigned to cluster i with cluster

center w(I); and = 0 otherwise -- indicator function.

c

i

K

kkk iWxdixI

KD

1 1

))(,(),(1

Page 9: Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

(C) 2001 by Yu Hen Hu 9

Intro. ANN & Fuzzy Systems

k-means Clustering AlgorithmInitialization: Initial cluster center w(i); 1 i c, D(–1)= 0, I(xk,i) =

0, 1 i c, 1 k K;Repeat (A) Assign cluster membership (Expectation step)

Evaluate d(xk, w(i)); 1 i c, 1 k K I(xk,i) = 1 if d(xk, w(i)) < d(xk, w(j)), j i;

= 0; otherwise. 1 k K (B) Evaluate distortion D:

(C) Update code words according to new assignment (Maximization)

(D) Check for convergence if 1–D(Iter–1)/D(Iter) < , then convergent = TRUE,

ciixINxixIiWN

kki

N

kkk

1,),(,),()(11

KkiwxdixIiterDN

kkk

1))(,(),()(1

Page 10: Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

(C) 2001 by Yu Hen Hu 10

Intro. ANN & Fuzzy Systems

A Numerical Example

x = {-1, -2,0,2,3,4}, W={2.1, 2.3}1. Assign membership

2.1: {-1, -2, 0, 2}2.3: {3, 4}

2. DistortionD = (-1-2.1)2 + (-2-2.1)2 + (0-2.1)2 + (2-2.1)2 + (3-2.3)2 + (4-2.3)2

3. Update W to minimize distortion

W1 = (-1-2+0+2)/4 = -.25W2 = (3+4)/2 = 3.5

4. Reassign membership-.25: {-1, -2, 0}3.5: {2, 3, 4}

5. Update W:w1 = (-1-2+0)/3 = -1w2 = (2+3+4)/3 = 3.

Converged.

Page 11: Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

(C) 2001 by Yu Hen Hu 11

Intro. ANN & Fuzzy Systems

Kmeans Algorithm Demonstration

-1 -0.5 0 0.5 1 1.5-0.5

0

0.5

1

1.5

2data points True cluster centers

-2 -1 0 1 2 3-1

-0.5

0

0.5

1

1.5

2

2.5data samples converged centerstrue centers cluster boundary

Clusterdemo.m