Automatic Clustering & Classification Team Yang Team: YangPriyankaJitheshArun.
Classification & Clustering
-
Upload
cooper-delaney -
Category
Documents
-
view
29 -
download
2
description
Transcript of Classification & Clustering
ComputerScience
UniversiteitMaastricht
Institute for Knowledgeand Agent Technology
Classification & Clustering
Pieter Spronckhttp://www.cs.unimaas.nl/p.spronck
219 Apr 2023
Binary Division of Marbles
319 Apr 2023
Big vs. Small
419 Apr 2023
Transparent vs. Opaque
519 Apr 2023
Marble Attributes
Size (big vs. small)Transparency (transparent vs. opaque)Shininess (shiny vs. dull)Colouring (monochrome vs. polychrome)Colour (blue, green, yellow, …)…
619 Apr 2023
Grouping of Marbles
719 Apr 2023
“Marbles”
819 Apr 2023
“Honouring All Distinctions”
919 Apr 2023
“Colour Coding”
1019 Apr 2023
if transparent then if coloured glass
then group 1else group 3
else group 2
1
2
3
“Natural Grouping”
1119 Apr 2023
Types of Clusters
Uniquely classifying clustersOverlapping clustersProbabilistic clustersDendrograms
1219 Apr 2023
Uniquely Classifying Clusters
1319 Apr 2023
Overlapping Clusters
1419 Apr 2023
Probabilistic ClusteringCluster Green Blue Typical
Samples
1 1.0 0.0
2 0.0 1.0
3 0.1 0.9
4 0.5 0.5
1519 Apr 2023
Dendrogramopaque
transparent
not clear clear
1619 Apr 2023
Classification
Ordering of entities into groups based on their similarityMinimisation of within-group varianceMaximisation of between-group varianceExhaustive and exclusivePrincipal technique: clustering
1719 Apr 2023
Reasons for Classification
Descriptive powerParsimonyMaintainabilityVersatilityIdentification of distinctive attributes
1819 Apr 2023
Typology vs. Taxonomy
Typology – conceptualTaxonomy – empirical
1919 Apr 2023
Typology
Define conceptual attributesSelect appropriate attributes Create typology matrix (substruction)Insert empirical entities in matrixExtend matrix if necessaryReduce matrix if necessary
2019 Apr 2023
Defining Conceptual Attributes
MeaningfulFocus on ideal typesOrder of importanceExhaustive domains
2119 Apr 2023
Conceptual Marble Attributes
2219 Apr 2023
Typology Matrix
Transparency
ColouringOpaque Transparent
Monochrome
Polychrome
2319 Apr 2023
Matrix Extension
Transparency
Colouring
Transparent Opaque
Clear Not clear Clear Not clear
Monochrome
Big
Small
PolychromeBig
Small
GlassSize
2419 Apr 2023
Reduction
Functional reductionPragmatic reductionNumerical reductionReduction by using criterion types
2519 Apr 2023
Functional Reduction
Transparency
Colouring
Transparent Opaque
Clear Not clear Clear Not clear
Monochrome
Big
Small
PolychromeBig
Small
GlassSize
2619 Apr 2023
Functionally Reduced Matrix
Transparency
Colouring
Transparent
OpaqueClear Not clear
Monochrome
Big
Small
PolychromeBig
Small
GlassSize
2719 Apr 2023
Pragmatic Reduction
Transparency
Colouring
Transparent Opaque
Clear Not clear Clear Not clear
Monochrome
Big
Small
PolychromeBig
Small
GlassSize
2819 Apr 2023
Pragmatically Reduced Matrix
Transparency
Size
Transparent
OpaqueClear Not clear
Small
Monochrome
Polychrome
Big
GlassColouring
2919 Apr 2023
Criticising Typological Classification
ReificationResilienceProblematic attribute selectionUnmanageability
3019 Apr 2023
Taxonomy
Define empirical attributesSelect appropriate attributesCreate entity matrixApply clustering techniqueAnalyse clusters
3119 Apr 2023
Empirical Attributes
Big
Single colour
Lots of colours
Green glass
Transparent
Blue
Yellow
WhiteDull
Shiny
3219 Apr 2023
Selecting Attributes
Size (big/small)Colour (yellow, green, blue, red, white…)Colouring (monochrome/polychrome)Shininess (shiny/dull)Transparency (transparent/opaque)Glass colour (clear, green, …)
3319 Apr 2023
Entity MatrixBig Monochrome Shiny Transparent Big Monochrome Shiny Transparent
N Y Y Y N Y Y N
N Y Y Y N Y Y N
N Y Y Y N Y Y N
N Y Y Y N Y Y N
N Y Y Y N Y Y N
N N N N N N Y Y
Y N N N Y N Y Y
Y Y Y N
3419 Apr 2023
Automatic Clustering Parameters
Agglomerative vs. divisiveMonothetic vs. polytheticOutliers permittedLimits to number of clustersForm of linkage (single, complete, average)
…
3519 Apr 2023
Automatic Clustering
NYYYsmall, monochrome, shiny, transparent
*NNNpolychrome, dull, opaque
*NYYpolychrome, shiny, transparent
YYYNbig, monochrome, shiny, opaque
NYYNsmall, monochrome, shiny, opaque
3619 Apr 2023
Polythetic to Monothetic
NYYYsmall, monochrome,shiny, transparent
*NNNpolychrome, dull, opaque
*NYYpolychrome, shiny, transparent
NYYNsmall, monochrome,
shiny, opaque
*YYNmonochrome, shiny, opaque
3719 Apr 2023
Analysing Clusters
“Vanilla”
“Stone”
“Tiger”
“Classic”
small, monochrome,shiny, transparent polychrome, dull,
opaque
polychrome, shiny,transparent
small, monochrome,shiny, opaque
3819 Apr 2023
Criticising Taxonomical Classification
Dependent on specimensDifficult to generaliseDifficult to labelBiased towards academic disciplineNot the “last word”
3919 Apr 2023
Typology vs. Taxonomy
Typology Taxonomy
Conceptual Empirical
Subjective Objective
Manual (Mostly) automatic
Less discriminative More discriminative
Goes awry when there are insufficient insights
Goes awry when there are insufficient specimens
4019 Apr 2023
Operational Classification
Typology(conceptual)
Taxonomy(empirical)
Operational typology(conceptual + empirical)
4119 Apr 2023
Automated Clustering Methods
Iterative distance-based clustering: the k-means methodIncremental clustering:the Cobweb methodProbability-based clustering:the EM algorithm
4219 Apr 2023
k-Means Method
Iterative distance-based clustering DivisivePolytheticPredefined number of clusters (k)Outliers permitted
4319 Apr 2023
k-Means (pass 1)
k = 2attributes: size (big/small), colouring (monochrome/polychrome), shininess (shiny/dull), transparency (transparent/opaque)
?
?
4419 Apr 2023
k-Means (pass 2)
Cluster average:small, monochrome,shiny, transparent.
Cluster average:small, polychrome,dull, opaque
k = 2attributes: size (big/small), colouring (monochrome/polychrome), shininess (shiny/dull), transparency (transparent/opaque)
4519 Apr 2023
k-Means (pass 3)
Cluster average:small, monochrome,shiny, transparent.
Cluster average:big, polychrome,dull, opaque
k = 2attributes: size (big/small), colouring (monochrome/polychrome), shininess (shiny/dull), transparency (transparent/opaque)
?
4619 Apr 2023
Cobweb Algorithm
Incremental clustering AgglomerativePolytheticDynamic number of clustersOutliers permitted
4719 Apr 2023
Cobweb Procedure
Builds a tree by adding instances to itUses a Category Utility function to determine the quality of the clusteringChanges the tree structure if this positively influences the Category Utility (by merging nodes or splitting nodes)“Cutoff” value may be used to group sufficiently similar instances together
4819 Apr 2023
Category Utility
Measure for quality of clusteringThe better the predictive value of the average attribute values of the instances in the clusters for the individual attribute values, the higher the CU will be
k
vaCvaCCCCU i j ijiiji
k
22
1
Pr|PrPr,...,
4919 Apr 2023
Category Utility for “Size” (1)
C1 C2
a) Pr[size=big|C1] = 1/3b) Pr[size=big|C2] = 1/3c) Pr[size=big] = 1/3d) Pr[C1] = 1/2
e) Pr[size=small|C1] = 2/3f) Pr[size=small|C2] = 2/3g) Pr[size=small] = 2/3h) Pr[C2] = 1/2
CU = (d((a2–c2)+(e2–g2))+h((b2–c2)+(f2–g2)))/2 = 0
5019 Apr 2023
Category Utility for “Size” (2)
C1 C2
a) Pr[size=big|C1] = 2/3b) Pr[size=big|C2] = 0c) Pr[size=big] = 1/3d) Pr[C1] = 1/2
e) Pr[size=small|C1] = 1/3f) Pr[size=small|C2] = 1g) Pr[size=small] = 2/3h) Pr[C2] = 1/2
CU = (d((a2–c2)+(e2–g2))+h((b2–c2)+(f2–g2)))/2 =
((1/2)((1/3)+(–1/3))+(1/2)((–1/9)+(5/9)))/2 = 1/9
5119 Apr 2023
Category Utility for “Size” (3)
C1 C2
a) Pr[size=big|C1] = 1b) Pr[size=big|C2] = 0c) Pr[size=big] = 1/3d) Pr[C1] = 1/3
e) Pr[size=small|C1] = 0f) Pr[size=small|C2] = 1g) Pr[size=small] = 2/3h) Pr[C2] = 1/2
CU = (d((a2–c2)+(e2–g2))+h((b2–c2)+(f2–g2)))/2 =
((1/3)((8/9)+(–4/9))+(2/3)((–1/9)+(5/9)))/2 = 2/9
5219 Apr 2023
Cobweb Example
12
attributes: size (big/small), colouring (monochrome/polychrome), shininess (shiny/dull), transparency (transparent/opaque)
5319 Apr 2023
Cobweb Result Example
attributes: size (big/small), colouring (monochrome/polychrome), shininess (shiny/dull), transparency (transparent/opaque)
5419 Apr 2023
Cobweb Numerical
Probability of values of attributes of instances in a cluster is based on the standard deviation from the estimate for the mean valueAcuity is presumed variance in attribute values
5519 Apr 2023
Disadvantages of Previous Methods
Fast and hard to judgeDependent on initial setupAd-hoc limitationsHard to escape from local minima
5619 Apr 2023
Probability-based Clustering
Finite mixture modelsEach cluster is defined by a vector of probabilities for instances to have certain values for their attributes, and a probability for instances to reside in the cluster. Clustering equals searching for optimal sets of probabilities for a sample set
5719 Apr 2023
Expectation-Maximisation (EM)
Probability-based clusteringDivisivePolytheticPredefined number of clusters (k)Outliers permitted
5819 Apr 2023
EM Procedure
Select k cluster vectors randomlyCalculate cluster probabilities for each instance (under the assumption that the instance attributes are independent)Use calculations to re-estimate valuesRepeat until increase in quality becomes negligible
5919 Apr 2023
EM Result Example
pC1=0.2pbig=0.6pmonochrome=0.3pshiny=0.4ptransparent=0.4
pC2=0.8 pbig=0.2pmonochrome=0.8pshiny=0.9ptransparent=0.5
.2*.4*.3*.4*.6=0.0058 .8*.8*.8*.9*.5=0.2304.2*.4*.7*.6*.6=0.0202 .8*.8*.2*.1*.5=0.0064.2*.6*.7*.4*.4=0.0134 .8*.2*.2*.9*.5=0.0144
6019 Apr 2023
The Essence of Classification
A successful classification defines fundamental characteristicsA classification can never be better than the attributes it is based upon
There is no magic formula