Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.
-
Upload
belinda-potter -
Category
Documents
-
view
219 -
download
0
Transcript of Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.
![Page 1: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/1.jpg)
Theoretical Foundations of Clustering – CS497 Talk
Shai Ben-David
February 2007
![Page 2: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/2.jpg)
What is clustering?
Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.
40 45 50 55
74
76
78
80
82
84
![Page 3: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/3.jpg)
Another example
![Page 4: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/4.jpg)
Why should we care about clustering?
Clustering is a basic step in most data mining procedures:
Examples :
Clustering movie viewers for movie ranking.
Clustering proteins by their functionality.
Clustering text documents for content similarity.
![Page 5: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/5.jpg)
Clustering is one of the most widely used toolfor exploratory data analysis. Social Sciences Biology Astronomy Computer Science . .All apply clustering to gain a first understanding of the structure of large data sets.
The Theory-Practice Gap
Yet, there exist distressingly little theoretical understanding of clustering
![Page 6: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/6.jpg)
What questions should research address? What is clustering?
What is “good” clustering?
Can clustering be carried out efficiently?
Can we distinguish “clusterable” from “structureless” data?
Many more …
![Page 7: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/7.jpg)
“Clustering” is an ill defined problem
There are many different clustering tasks, leading to different clustering paradigms:
There are Many Clustering TasksThere are Many Clustering Tasks
![Page 8: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/8.jpg)
“Clustering” is an ill defined problem
There are many different clustering tasks, leading to different clustering paradigms:
There are Many Clustering TasksThere are Many Clustering Tasks
![Page 9: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/9.jpg)
Some more examples
-2 0 2
-3-2
-10
12
3
2-d data set
-2 0 2
-3-2
-10
12
3
Compact partitioning into tw o strata
-2 0 2
-3-2
-10
12
3
Unsupervised learning
![Page 10: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/10.jpg)
I would like to discuss the broad notion of clustering
Independently of any particular algorithm, objective function or generative data model
![Page 11: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/11.jpg)
What for?
Choosing a suitable algorithm for a given task.
Evaluating the quality of clustering methods. Distinguishing significant structure from random fata morgana,
Distinguishing clusterable data from structure-less data.
Providing performance guarantees for clustering algorithms.
Much more …
![Page 12: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/12.jpg)
The Basic Setting
For a finite domain set SS, a dissimilarity function (DF) is a symmetric mapping
d:SxSd:SxS → RR++ such that d(x,y)=0d(x,y)=0 iff x=yx=y.
A clustering function takes a dissimilarity function on SS and returns a partition of SS.
We wish to define the properties that distinguish clustering functions (from any other functions that output domain partitions).
![Page 13: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/13.jpg)
Kleinberg’s Axioms
Scale Invariance F(F(λλd)=F(d)d)=F(d) for all d d and all strictly positive λλ.
Richness For any finite domain SS, {F(d): d {F(d): d is a DF over S}={P:P S}={P:P a partition of S} S}
Consistency d’d’ equals dd except for shrinking distances within
clusters of F(d)F(d) or stretching between-cluster distances (w.r.t. F(d)F(d)), then F(d)=F(d’).F(d)=F(d’).
![Page 14: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/14.jpg)
Note that any pair is realizable
Consider Single-Linkage with different stopping criteria:
k connected components. Distance r stopping. Scale α stopping:
add edges as long as their length is
at most α(max-distance)
![Page 15: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/15.jpg)
Kleinberg’s Impossibility result
There exist no clustering function
Proof:
Scaling up
Consistency
![Page 16: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/16.jpg)
A Different Perspective
The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy.
Axioms as a tool for classifying clustering paradigms
![Page 17: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/17.jpg)
Axioms as a tool for a taxonomy of clustering paradigms
The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy.
Scale Invariance
Richness Local Consistency
Full Consistency
Single Linkage - + + +Center Based + + + -Spectral + + + -MDL + + -Rate Distortion + + -
“Axioms” “Properties”
![Page 18: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/18.jpg)
Ideal Theory
We would like to have a list of simple properties so that major clustering methods are distinguishable from
each other using these properties.
We would like the axioms to be such that all methods satisfy all of them, and nothing that is clearly not a clustering satisfies all of them.
(this is probably too much to hope for).
In the remainder of this talk, I would like to discuss some candidate “axioms” and “properties” to get a taste of
what this theory-development program may involve.
![Page 19: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/19.jpg)
Types of Axioms/Properties
Richness requirements
E.g., relaxations of Kelinberg’s richness, e.g.,
{F(d): d {F(d): d is a DF over S}={P:P S}={P:P a partition of S S into k k sets}}
Invariance/Robustness/Stability requirements.
E.g., Scale-Invariance, Consistency, robustness
to perturbations of dd (“smoothness” of F F) or stability
w.r.t. sampling of SS.
![Page 20: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/20.jpg)
Relaxations of Consistency
Local Consistency –
Let CC11, …C…Ckk be the clusters of F(d). F(d).
For every λλ0 0 ≥ 1 ≥ 1 and positive λλ11, .., ..λλk k ≤≤ 11, if d’d’ is defined by:
λλiid(a,b)d(a,b) if aa and bb are in CCii
d’(a,b)=d’(a,b)=
λλ00d(a,b)d(a,b) if a,ba,b are not in the same F(d)F(d)--cluster,
then F(d)=F(d’).F(d)=F(d’).
Is there any known clustering method for which it fails?
![Page 21: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/21.jpg)
Other types of clustering
Culotta and McCallum’s “Clusterwise Similarity”
Edge-Detection (advantage to smooth contours)
Texture clustering
The professors example.
![Page 22: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062309/5697bf7a1a28abf838c832e5/html5/thumbnails/22.jpg)
Conclusions and open questions
There is a place for developing an axiomatic framework for clustering.
The existing negative results do not rule
out the possibility of useful axiomatization. We should also develop a system of “clustering
properties” for a taxonomy of clustering methods. There are many possible routes to take and
hidden subtleties in this project.