Machine Learning: Clustering · Machine Learning: Clustering Ste en Rendle Information Systems and...

Clustering k-Means Agglomerative Clustering Use Case Summary

Machine Learning: Clustering

Steffen Rendle

Information Systems and Machine Learning Lab (ISMLL)University of Hildesheim

Wintersemester 2007 / 2008

Steffen Rendle Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim


ClusteringOverviewExamplesClustering Tasks

k-MeansOverviewAlgorithm

Agglomerative ClusteringOverviewAlgorithm

Use CaseTaskMethod

Summary



Overview

The objective of clustering is to group similar data D.

I the groups are called clusters

I clustering is unsupervised, i.e. neither training data nor classesare given in advance

I grouping/ clustering depends on the algorithm



Example



Example: Clustering of Search Results



Example: Wafer Analysis

Fertigungsprozess 1

Fertigungsprozess 2

...

Fehlerursache 1

Fehlerursache 2

Test 1..n



Example: Wafer Analysis



Example: Object Identification

DB

Shop Product name Price

T-Online Fuji FinePix S5600 279,00

Amazon FujiFilm FinePix S5600 Digitalkamera (5 Megapixel, 10fach Zoom) 254,90

Cyberport Fuji FinePix S5600 259,90

Mediamarkt Fine Pix S 5600 245,00


Amazon Fuji FinePix S5500 Digitalkamera (4 Megapixel, 10x opt. Zoom) 349,99



Clustering Tasks

I Hard-Clustering: find a partition of the data

I Soft-Clustering/ Fuzzy-Clustering: find propabilities of groupmembership for each item

I Hierarchical Clustering: find a dendrogram (tree) of the data



Hard-Clustering



Soft-Clustering



Hierarchical-Clustering

A B C D E F G H I J K

AB

C D

E

FG

H

I

JK



k-Means

I partitional clusteringI given

I Data D = {d1, ..., dn} ∈ P(Rm) with di = (xi,1, . . . , xi,m) ∈ Rm

I Number of clusters kI Similarity sim : Rm × Rm → R+

I to findI Partition of the data f : D → {1, . . . , k}



k-Means Algorithm

function k-Means(D, k , sim)for all j ∈ {1, . . . , k} do

yj ← randomD

end forrepeat

f ′ ← ff (d)← argmax

j∈{1,...,k}sim(yj , d)

for all j ∈ {1, . . . , k} doyj ← avg

d∈{d |f (d)=j}d

end foruntil f’ = freturn f

end function



Example



Problems of k-Means I



Problems of k-Means II



Problems of k-Means

I k-Means is run several times and the”best“ result is returned.

I for determing the”best“ partition heuristic measures like intra

cluster variance can be used:

ICV(f ,D) =k∑

j=1

∑d∈{d |f (d)=j}

∥∥∥∥∥d − avgd ′∈{d ′|f (d ′)=j}

d ′

∥∥∥∥∥2



Properties of k-Means

I easy to implement

I in practice often fastI data must be present in a metric space (e.g. euclidian space:

Rn with ‖·‖) so that centroids can be calculated.I Counter example: strings



Agglomerative Clustering

Agglomerative Clustering can solve several tasks:

I partitional clustering with given number of clusters k orsimilarity threshold θ

I hierarchical clustering



Greedy Agglomerative Clustering

I partitional clusteringI given

I Data D = {d1, ..., dn}I Similarity sim : D × D → R+

I Number of clusters k or threshold θ on similarities

I to findI Partition of the data f : D → N



Hierarchical Agglomerative Clustering

I hierarchical clusteringI given

I Data D = {d1, ..., dn}I Similarity sim : D × D → R+

I to findI Series fi of partitions of the data fi : D → N with

img fi ⊂ img fi+1



Agglomerative Clustering Algorithm

function AgglomerativeClustering(D, sim)m← 0for all i ∈ {1, . . . , n} do

fm(di )← iend forrepeat

(i , j) = argmaxi ,j∈img(fm),i 6=j

sim?(fm, i , j)

fm+1 ← fmfor all d ∈ {d ′|fm(d ′) = j} do

fm+1(d)← iend form← m + 1

until convergence(fm)return fm

end function



Convergence

convergence(f ) depends on the task:

I if k given:convergence(f )⇔ | img(f )| ≤ k

I if θ given:convergence(f )⇔ max

i ,j∈img(f ),i 6=jsimX (f , i , j) ≤ θ

I in case of hierarchical clustering:convergence(f )⇔ | img(f )| = 1



Similarity between Clusters

A

BE

D

C

A

BE

D

C

?0.9

0.82

0.60.7

0.2

0.63



Similarity between Clusters

Several possibilities for similarity sim?(f , i , j) between clusters

I single linkage:simSL(f , i , j) = max

(d ,d ′)∈f −1(i)×f −1(j)sim(d , d ′)

I complete linkage:simCL(f , i , j) = min

(d ,d ′)∈f −1(i)×f −1(j)sim(d , d ′)

I average linkage:simAL(f , i , j) = avg

(d ,d ′)∈f −1(i)×f −1(j)

sim(d , d ′)



Single Linkage

A

BE

D

C

A

BE

D

C

0.90.9

0.82

0.60.7

0.2

0.63



Complete Linkage

A

BE

D

C

A

BE

D

C

0.20.9

0.82

0.60.7

0.2

0.63



Average Linkage

A

BE

D

C

A

BE

D

C

0.640.9

0.82

0.60.7

0.2

0.63



Example: Agglomerative Clustering with Average Linkage

A

B

CD

E

FG

H

I

J

K




A

B

CD

E

FG

H

I

J

K


A B C D E F G H I J KAB .8C .8 .9D .5 .9 .8E .2 .2 .3 .2F .1 .1 .2 .1 .9G .1 .2 .3 .2 .9 .8H .2 .2 .2 .3 .1 .0 .2I .2 .2 .2 .3 .2 .1 .3 .9J .0 .1 .1 .2 .2 .1 .3 .8 .9K .0 .1 .1 .2 .1 .0 .3 .8 .9 .9





A

B

CD

E

FG

H

I

J

K


A BC D E F G H I J KABC .8D .5 .85E .2 .25 .2F .1 .15 .1 .9G .1 .25 .2 .9 .8H .2 .20 .3 .1 .0 .2I .2 .20 .3 .2 .1 .3 .9J .0 .10 .2 .2 .1 .3 .8 .9K .0 .10 .2 .1 .0 .3 .8 .9 .9

A BC D E F G H I J K




A

B

CD

E

FG

H

I

J

K


A BC D E F G H I JKABC .8D .5 .85E .2 .25 .2F .1 .15 .1 .9G .1 .25 .2 .9 .8H .2 .20 .3 .1 .0 .2I .2 .20 .3 .2 .1 .3 .9JK .0 .10 .2 .15 .05 .3 .8 .9

A BC D E F G H I JK




A

B

CD

E

FG

H

I

J

K


A BC D E F G HI JKABC .8D .5 .85E .2 .25 .2F .1 .15 .1 .9G .1 .25 .2 .9 .8HI .2 .20 .3 .15 .05 .25JK .0 .10 .2 .15 .05 .3 .85

A BC D E F G HI JK





A

B

CD

E

FG

H

I

J

K

A BC D EF G HI JKABC .8D .5 .85EF .15 .20 .15G .1 .25 .2 .85HI .2 .20 .3 .10 .25JK .0 .10 .2 .10 .3 .85

A BC D EF G HI JK





A

B

CD

E

FG

H

I

J

K

A BCD EF G HI JKABCD .7EF .15 .18G .1 .23 .85HI .2 .23 .10 .25JK .0 .13 .10 .3 .85

A BCD EF G HI JK





A

B

CD

E

FG

H

I

J

K

A BCD EFG HI JKABCD .7EFG .13 .20HI .2 .23 .15JK .0 .13 .16 .85

A BCD EFG HI JK





A

B

CD

E

FG

H

I

J

K

A BCD EFG HIJKABCD .7EFG .13 .20HIJK .1 .18 .16

A BCD EFG HIJK





A

B

CD

E

FG

H

I

J

K

ABCD EFG HIJKABCDEFG .18HIJK .16 .16

ABCD EFG HIJK





A

B

CD

E

FG

H

I

J

K

ABCDEFG HIJKABCDEFGHIJK .16

ABCDEFG HIJK





A

B

CD

E

FG

H

I

J

K

ABCDEFGHIJKABCDEFGHIJK

ABCDEFGHIJK



Properties of Agglomerative Clustering

I several tasks can be solved: partitional clustering with numberof clusters or threshold and hierarchical clustering

I no metric space is necessary

I runtime complexity O(n2 log(n))



Use Case: Object Identification

I Object Identification (OI) finds identical items for informationintegration.

I OI tasks are semi-supervised.

I OI models use both clustering and classification techniques.



DB

Shop Product name PriceT-Online Fuji FinePix S5600 279,00Amazon FujiFilm FinePix S5600 Digitalkamera (5 Megapixel, 10fach Zoom) 254,90Cyberport Fuji FinePix S5600 259,90Mediamarkt Fine Pix S 5600 245,00


Amazon Fuji FinePix S5500 Digitalkamera (4 Megapixel, 10x opt. Zoom) 349,99



Object Identification Problem

A

BC

D

EF

GH

I

A

BC

D

EF

GH

I

SolutionProblem



Adaptive Setting

A

BC

D

EF

GH

I

A

BC

D

EF

GH

I

Solution

J

K

PQ

R

L

MN

O

Training Set

Problem

L1

L2

L3



Types of Labels

Often some parts of the data provide information about identities:

I Some offers are labeled by a unique identifier– e.g. an EAN, UPC, ISBN.

I New offers should be merged into an already integrateddatabase– e.g. new products, new shops should be integrated.

I Some offers are known to be identical / different– e.g. provided by a supervisor.

I N databases should be merged and each database contains noduplicates.



Iterative Problem Citer

A

BC

D

E

F

GH

I

Iterative Problem

A

BC

D

E

F

GH

I

A Consistent Solution

L1

L2

L3

Unknown class label



Constrained Problem Cconstr

A

BC

D

E

F

GH

I

Constrained Problem

A

BC

D

E

F

GH

I

A Consistent Solution

Must-Link ConstraintCannot-Link Constraint



Problem Classes

Problem classes are defined by their preconditions, that restrict thespace E ⊆ X 2 of consistent solutions:

I Iterative Problems Citergiven: EY with Y ⊆ XE = {E |EY = E ∩ Y 2}

I Constrained Problems Cconstr

given: Rml ⊆ X 2, Rcl ⊆ X 2

E = {E |E ⊇ Eml ∧ E ∩ Rcl = ∅}I Matching Problems Cmatch

given: X =⋃

Ai with A = (A1, . . . ,An)E = {E |E ∩ (X 2 \ (

⋃A2

i \ {x , x |x ∈ Ai})) = ∅}



Hierarchy of Problem Classes

One can show:

Cclassic ⊂ Citer ⊂ Cconstr

Cclassic ⊂ Cmatch ⊂ Cconstr

Citer 6⊆ Cmatch

Cmatch 6⊆ Citer



There are constrained problems that cannot be expressed as aniterative problem:

A

B GH

A

BG

H

A

B

HG

Iterative Problem

Iterative Problem

Constrained Problem

Must-Link Constraint

L2

L1

L1



Generic Object Identification Model

I Feature Extractionffeature : X 2 → Rn

I Probabilistic pairwise decision modelfpairwise : X 2 → [0, 1]

I Collective decision modelfglobal : P(X )× P(X 2)× P(X 2)→ E



Data

Object Brand Product Name Pricex1 Hewlett Packard Photosmart 435 Digital Camera 118.99x2 HP HP Photosmart 435 16MB memory 110.00x3 Canon Canon EOS 300D black 18-55 Camera 786.00

Feature Extraction

Object Pair TFIDF-Cosine Similarity FirstNumberEqual Rel. Difference(Product Name) (Product Name) (Price)

(x1, x2) 0.6 1 0.076(x1, x3) 0.1 0 0.849(x2, x3) 0.0 0 0.860

Probabilistic Pairwise Decision Model

Object Pair P[xi ≡ xj ](x1, x2) 0.8(x1, x3) 0.2(x2, x3) 0.1



Learning and Constraints

Information provided by constraints can be used for training anidentification model:

I Probabilistic pairwise decision model: trained classifier (e.g.SVM)

I Collective decision model: constrained clustering algorithm(e.g. constrained HAC) using the pairwise decision model as alearned similarity measure.



Constrained Agglomerative Clustering Algorithmfunction ConstrainedAgglClustering(X ,Rml,Rcl, sim)

m← 0for all i ∈ {1, . . . , n} do

fm(xi )← iend forfm ← ApplyMustLink(f ,Rml)repeat

(i , j) = argmaxi ,j∈img(fm),i 6=j ,not HasCannotLink(fm,i ,j ,Rcl)

sim?(fm, i , j)

fm+1 ← fmfor all x ∈ {y |fm(y) = j do

fm+1(x)← iend form← m + 1

until convergence(fm)return fm

end functionSteffen Rendle Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim


Constrained Agglomerative Clustering Algorithm

function ApplyMustLink(f ,Rml)for all (x , y) ∈ Rml do

for all x ′ : f (x ′) = f (x) dof (x ′)← f (y)

end forend forreturn f

end function

function HasCannotLink(f , i , j ,Rcl)return ∃x ∈ f −1(i), y ∈ f −1(j) : (x , y) ∈ Rcl

end function



Summary

I Clustering groups data

I Groups depend on the similarity and the clustering method

I Clustering is an unsupervised task

I Semi-supervised clustering can use labels (e.g. on relations) tolearn the similarity measure and to enhance clustering.



Outlook

I Fuzzy / Soft clustering, e.g. Fuzzy C-MeansI cluster membership is a probability distribution

I Spectral clusteringI similarity matrix Sij := sim(di , dj)I use spectral methods on Sij – e.g. eigenvectors – to compute

clusters

I Constrained / Semi-supervised clusteringI constraints on objects, pairs, etc. are presentI example: object identification



Literature

A. K. Jain, M. N. Murty, and P. J. Flynn.Data clustering: a review.ACM Comput. Surv., 31(3):264–323, 1999.

S. Rendle and L. Schmidt-Thieme.Object identification with constraints.In Proceedings of the 6th IEEE International Conference onData Mining (ICDM-2006), Hong Kong, 2006.


Machine Learning: Clustering · Machine Learning: Clustering Ste en Rendle Information Systems and...

Documents

Transcript of Machine Learning: Clustering · Machine Learning: Clustering Ste en Rendle Information Systems and...