Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... ·...

52
Introduction to Machine Learning Introduction to Machine Learning Amo G. Tong 1

Transcript of Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... ·...

Page 1: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning

Introduction to Machine Learning Amo G. Tong 1

Page 2: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Lecture 13Unsupervised Learning

• K-means Framework

• Cut-based Framework

• Agglomerative Framework

• Divisive Framework

• Some materials are courtesy of Vibhave Gogate , Carlos Guestrin, Dan Klein & Luke Zettlemoyer, Eric Xing, Hastie.

• All pictures belong to their creators.

Introduction to Machine Learning Amo G. Tong 2

Page 3: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 3

Machine Learning

Machine Learning

Supervised Learning 𝒇(𝒙) Reinforcement Learning

ParametricRegressions vs ClassificationContinuous vs DiscreteLinear vs Non-linear

Methods:Linear regressionDecision TreeNeural network….

Non parametric

Instance-based learning KNN

Unsupervised Learning

Clustering

Page 4: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 4

Clustering

• Input: some data

• Goal: infer group information

Page 5: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 5

Clustering

• Input: some data

• Goal: infer group information

• E.g. Group emails, search results, detection styles.

source : http://ogrisel.github.io/scikit-learn.org/sklearn-tutorial/_images/plot_cluster_comparison_11.png

Page 6: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 6

Clustering

• Input: some data

• Goal: infer group information

• E.g. Group emails, search results, detection styles.

Edge Foci Interest PointsDOI: 10.1109/ICCV.2011.6126263

Page 7: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 7

Clustering (Eric Xing)

• Input: some data

• Goal: infer group information

• Clustering is subjective.

Page 8: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 8

Clustering

• Input: some data

• Goal: infer group information

• Clustering is subjective.

• Similarity??

• Output:

• a partition

• Some pattern can reflect the group information.

Page 9: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 9

Clustering

• Input: some data

• Goal: infer group information

• E.g. Group emails, research results, detection styles.

• We have data but there is no label.

• We do not know how many clusters.

• We do not know which data belongs to which cluster.

• We do not even know if the hidden pattern exists.

• BUT we never give up..

Page 10: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 10

Clustering

• BUT we never give up..

• Partition based framework

• Hierarchical clustering framework

Page 11: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 11

K-means Framework

• We have some data.

• We can define (a) the similarity between two instances and (b) the center of a set of instances.

• E.g. Euclidian space (real vector)

• Distance 𝒙𝟏 − 𝒙𝟐2

• similarity=1/distance

• Center of 𝑥1, … , 𝑥𝑛 : ഥ𝒙 =σ 𝒙𝑖

𝑛

Page 12: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 12

K-means Framework

• We have some data.

• We can define (a) the similarity between two instances and (b) the center of a set of instances.

• Suppose there are 𝑘 clusters.

• Randomly select 𝑘 centers

• Repeat

• Assign each instance to the closest center. (now we have 𝑘 clusters)

• Recompute the center of each cluster.

• Until converge or other criteria meet

Page 13: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 13

K-means Framework (Bishop)

• Example (Euclidian space)

Suppose k=2.Step 1: random pick two centers

Page 14: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 14

K-means Framework (Bishop)

• Example (Euclidian space)

Suppose k=2.Step 1: random pick two centersStep 2: assign points to the closest center

Page 15: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 15

K-means Framework (Bishop)

• Example (Euclidian space)

Suppose k=2.Step 1: random pick two centersStep 2: assign points to the closest centerStep 3: calculate the center of each cluster

Page 16: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 16

K-means Framework (Bishop)

• Example (Euclidian space)

Suppose 𝑘 = 2.Step 1: random pick two centersStep 2: assign points to the closest centerStep 3: calculate the center of each clusterStep 4: assign points to the closest center

Repeat until converge.

Page 17: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

• Example (Graph Segmentation)

• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.

• Informally, identify main elements in an image.

Introduction to Machine Learning Amo G. Tong 17

K-means Framework (Bishop)

Pixel and color.

Page 18: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

• Example (Graph Segmentation)

• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.

• Informally, identify main elements in an image.

Introduction to Machine Learning Amo G. Tong 18

K-means Framework (Bishop)

Page 19: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

• Example (Graph Segmentation)

• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.

• Informally, identify main elements in an image.

Introduction to Machine Learning Amo G. Tong 19

K-means Framework (Bishop)

Page 20: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

• Example (Graph Segmentation)

• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.

• Informally, identify main elements in an image.

Introduction to Machine Learning Amo G. Tong 20

K-means Framework (Bishop)

Page 21: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

• Example (Graph Segmentation)

• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.

• Informally, identify main elements in an image.

Introduction to Machine Learning Amo G. Tong 21

K-means Framework (Bishop)

Page 22: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 22

K-means Framework

• Repeat

• Update the assignment.

• Update the means (centers).

• Until converge or other criteria meet

Page 23: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 23

K-means Framework

• Repeat

• Update the assignment.

• Update the means (centers).

• Until converge or other criteria meet

• Given the assignment 𝐶, let 𝐶(𝑥) be the mean (center) of the cluster containing 𝑥. Consider the Euclidian distance.

• Will it converge? Yes!

• Consider a potential function 𝑓 = σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥))

• 𝑓 will never increase and 𝑓 is bounded => it will converge

Page 24: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 24

K-means Framework

• Given the assignment 𝐶, let 𝐶(𝑥) be the means (center) of the cluster containing 𝑥. Consider the Euclidian distance.

• Repeat

• Update the assignment.

• Update the means (centers).

• Until converge or other criteria meet

• Updating the assignment will not increase 𝒇.

• Recalculating the means will not increase 𝒇.

• For a fixed cluster, which one can minizine the distance sum?

• Try Lagrange Multiplier Method (do it yourself).

𝑓 =

𝑥∈𝐷

dist(𝑥, 𝐶(𝑥))

𝑑𝑖𝑠𝑡 = 𝒙𝟏 − 𝒙𝟐2

Page 25: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 25

K-means Framework

• Simple

• Intuitive, minimize σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥)) (implicitly)

• Not time consuming, O(tkn) (k: clusters, t: iterations, n: instance #)

Page 26: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 26

K-means Framework

• Simple

• Intuitive, minimize σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥)) (implicitly)

• Not time consuming, O(tkn) (k: clusters, t: iterations, n: instance #)

• K-means may converge to local optimal

• How many clusters are there?

• Distance between clusters.

• How to define mean? What if the attributes are not real numbers

• Cannot handle noise

• Not suitable for non-convex patterns. (recall the pattern of knn)

Page 27: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 27

K-means Framework

• Simple

• Intuitive, minimize σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥)) (implicitly)

• Not time consuming, O(tkn) (k: clusters, t: iterations, n: instance #)

• K-means may converge to local optima

• How many clusters are there?

• Distance between clusters.

• How to define mean? What if the attributes are not real numbers

• Cannot handle noise

• Not suitable for non-convex patterns. (recall the pattern of knn)

Page 28: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 28

Cut-based Clustering

• Two intuitions behind a good clustering.

• (a) weaken the connection between objects in different clusters

• (b) strengthen the connection between objects within a cluster

Page 29: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 29

Cut-based Clustering

• Two intuitions behind a good clustering.

• (a) weaken the connection between objects in different clusters

• (b) strengthen the connection between objects within a cluster

• Ground set 𝑈 = {𝑣1, … 𝑣𝑛}

• Similarity between two elements 𝑠𝑖𝑚(𝑣𝑖 , 𝑣𝑗)

• A partition 𝐶1, … , 𝐶𝑘 of 𝑈

• Inner-sim(𝐶𝑖) = σ𝑢,𝑣∈ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣)

• Inter-sim(𝐶𝑖)= σ𝑢∈ 𝐶𝑖, 𝑣∉ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣) (cut)

How to measure the goodness of a cluster?

Cost of a clustering 𝐶1, … , 𝐶𝑘

σInter−sim(𝐶𝑖)

Inner−sim(𝐶𝑖)

Page 30: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 30

Cut-based Clustering

• Two intuitions behind a good clustering.

• Ground set 𝑈 = {𝑣1, … 𝑣𝑛}

• Similarity between two elements 𝑠𝑖𝑚(𝑣𝑖 , 𝑣𝑗)

• A partition 𝐶1, … , 𝐶𝑘 of 𝑈

• Inner-sim(𝐶𝑖) = σ𝑢,𝑣∈ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣)

• Inter-sim(𝐶𝑖)= σ𝑢∈ 𝐶𝑖, 𝑣∉ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣) (cut)

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

Optimal solution exists but it is hard to find.Enumerating?Polynomial time?

Page 31: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 31

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Initialize 𝐶1, … , 𝐶𝑘 randomly.

• Repeat until converge

• Unlock all elements

• Repeat until all elements are locked.

• Randomly select one 𝐶𝑖• Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if any

• Move 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕 is maximally decreased.

• Lock 𝑣.

Page 32: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 32

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Example. k=2

1

32

Cost= 0+ (3+1)/2 =2

ab

c

• Inner−cost(𝐶𝑖)=∞ if |𝐶𝑖|=1• Or you can do some smoothing

by assign a base similarity.

Page 33: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 33

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Example. k=2

1

32

Cost= 0+ (3+1)/2 =2

ab

c

If move c, cost=(1+2)/3+ 0 =1

1

32

ab

c

Inner−cost(𝐶𝑖)=∞ if |𝐶𝑖|=1

Page 34: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 34

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Example. k=2

1

32

Cost= 0+ (3+1)/2 =2

ab

c

If move c, cost=(1+2)/3+ 0 =1

If move b, cost=(3+2)/1+ 0 =5

1

32

ab

c

1

32

ab

c

Inner−cost(𝐶𝑖)=∞ if |𝐶𝑖|=1

Page 35: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 35

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Heuristic algorithm.

• May not be optimal

• Is the solution good?

• Reasonable. Cost is iteratively decreased.

• Does it converge?

• Yes.

Page 36: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 36

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Initialize 𝐶1, … , 𝐶𝑘 randomly.

• Repeat until converge (converge?)

• Unlock all elements

• Repeat until all elements are locked. (converge?)

• Randomly select one 𝐶𝑖• Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if any

• Move 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕 is maximally decreased.

• Lock 𝑣.

Page 37: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 37

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Heuristic algorithm.

• May not optimal

• Is the solution good?

• Reasonable. Cost is iteratively decreased.

• Does it converge?

• Yes.

• Any other choices?

• Yes

Page 38: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 38

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Initialize 𝐶1, … , 𝐶𝑘 randomly.

• Repeat until converge

• Unlock all elements

• Repeat until all elements are locked.

• Randomly select one 𝐶𝑖• Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if any

• Move 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕 is maximally decreased.

• Lock 𝑣.

You may select the one that, after considered, can maximally decrease the cost

Page 39: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 39

Cut-based Clustering

• Compare to k-means

• The number of clusters is known in advance.

• Need some initializations

• Iteratively improve the solution.

• Cut-based: consider both the inter and inner similarity

• K-means: only consider the inner similarity.

Page 40: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 40

Agglomerative Clustering

• Idea: combine small clusters.

Page 41: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 41

Agglomerative Clustering

• Idea: combine small clusters.

• Framework:

• Maintain a set of clusters

• Initially, each instance is one cluster

• Repeat

• Merge two closest clusters

• Until there is one cluster

• Key: how to define closeness of clusters?

Page 42: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 42

Agglomerative Clustering

• Key: how to define closeness of clusters?

• First, define the closeness of each pair.

• The closeness of the clusters can be• The closest pair (single-link clustering)

• The farthest pair (complete-link clustering, diameter)

• Sum of all pairs? Average of all pairs.

• Ward’s method

• If you can define the distance within a cluster, find the pair of cluster that results minimum increase in in-cluster distance.

Page 43: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 43

Agglomerative Clustering (Hastie)

• The result of agglomerative clustering hierarchy of clusters.

dendrogram

So what if we want k clusters?

Page 44: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 44

Agglomerative Clustering

Detect

outliers.

Page 45: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 45

Divisive Clustering

• Idea: split a large cluster into two

Page 46: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 46

Divisive Clustering

• Idea: split a large cluster into two

• Framework:

• Maintain a set of clusters

• Initially, all instances form one cluster

• Repeat

• Split one cluster into two

• Until each cluster is a singleton.

• Key: Which cluster should we split? How to split it?

Page 47: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 47

Divisive Clustering (Andrea)

Key: Which cluster should we split? How to split it?

Page 48: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 48

Divisive Clustering

• Idea: split a large cluster into two

• Framework:

• Maintain a set of clusters

• Initially, all instances form one cluster

• Repeat

• Split one cluster into two

• Until each cluster is a singleton.

Which cluster should we split?

If we grow the entire dendrogram and your splitting rule is local, it does not matter.

Otherwise, you may select the one with the highest cost.

How to split it? (many choices)Equally partition it such that the cost is minimized.

DIANA.

Page 49: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 49

Divisive Clustering

• Idea: split a large cluster into two

• Framework:

• Maintain a set of clusters

• Initially, all instances form one cluster

• Repeat

• Split one cluster into two

• Until each cluster is a singleton.

Which cluster should we split?

If we grow the entire dendrogram and your splitting rule is local, it does not matter.

Otherwise, you may select the one with the highest cost.

How to split it? (many choices)Equally partition it such that the cost is minimized.

DIANA

DIANA:To divide the selected cluster, the algorithm first looks for its most disparate observation (i.e., which has the largest average dissimilarity to the other observations of the selected cluster). This observation initiates the "splinter group". In subsequent steps, the algorithm reassigns observations that are closer to the "splinter group" than to the "old party". The result is a division of the selected cluster into two new clusters.

Page 50: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 50

Hierarchal Clustering - Summary

• No need to specify the number of clusters in advance.

• Can be time consuming, time complexity of at least O(𝑛2), where n is the number of total objects

• Hierarchical structure stands for intuitions for some domains.

• But the interpretation is subjective.

Page 51: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Introduction to Machine Learning Amo G. Tong 51

Summary

• K-means

• Cut-based measurements

• Agglomerative clustering

• Divisive clustering

Page 52: Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... · Unsupervised Learning • K-means Framework • Cut-based Framework • Agglomerative

Spring 2019 Amo G. Tong 52

Equal-sized k-clustering

Cut-based k-clustering.

cost = σInter−cost(𝐶𝑖)

Inner−cost(𝐶𝑖)

Initialize 𝐶1, … , 𝐶𝑘 randomly.Repeat until converge (converge?)

Unlock all elementsRepeat until all elements are locked. (converge?)

Randomly select one 𝐶𝑖Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if anyMove 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕is maximally decreased.Lock 𝑣.

Given a set of 𝑘 ∗ 𝑚 elements, we want a equal-sized k-clustering. That is, each cluster has exactly 𝑚 elements.

Please describe a cut-based algorithm for such a purpose.

Hint: How to take account of the size of the clusters?