Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... ·...

Post on 27-Jul-2020

6 views 0 download

Transcript of Introduction to Machine Learningudel.edu/~amotong/teaching/machine learning/lectures/(Lec 13... ·...

Introduction to Machine Learning

Introduction to Machine Learning Amo G. Tong 1

Lecture 13Unsupervised Learning

• K-means Framework

• Cut-based Framework

• Agglomerative Framework

• Divisive Framework

• Some materials are courtesy of Vibhave Gogate , Carlos Guestrin, Dan Klein & Luke Zettlemoyer, Eric Xing, Hastie.

• All pictures belong to their creators.

Introduction to Machine Learning Amo G. Tong 2

Introduction to Machine Learning Amo G. Tong 3

Machine Learning

Machine Learning

Supervised Learning 𝒇(𝒙) Reinforcement Learning

ParametricRegressions vs ClassificationContinuous vs DiscreteLinear vs Non-linear

Methods:Linear regressionDecision TreeNeural network….

Non parametric

Instance-based learning KNN

Unsupervised Learning

Clustering

Introduction to Machine Learning Amo G. Tong 4

Clustering

• Input: some data

• Goal: infer group information

Introduction to Machine Learning Amo G. Tong 5

Clustering

• Input: some data

• Goal: infer group information

• E.g. Group emails, search results, detection styles.

source : http://ogrisel.github.io/scikit-learn.org/sklearn-tutorial/_images/plot_cluster_comparison_11.png

Introduction to Machine Learning Amo G. Tong 6

Clustering

• Input: some data

• Goal: infer group information

• E.g. Group emails, search results, detection styles.

Edge Foci Interest PointsDOI: 10.1109/ICCV.2011.6126263

Introduction to Machine Learning Amo G. Tong 7

Clustering (Eric Xing)

• Input: some data

• Goal: infer group information

• Clustering is subjective.

Introduction to Machine Learning Amo G. Tong 8

Clustering

• Input: some data

• Goal: infer group information

• Clustering is subjective.

• Similarity??

• Output:

• a partition

• Some pattern can reflect the group information.

Introduction to Machine Learning Amo G. Tong 9

Clustering

• Input: some data

• Goal: infer group information

• E.g. Group emails, research results, detection styles.

• We have data but there is no label.

• We do not know how many clusters.

• We do not know which data belongs to which cluster.

• We do not even know if the hidden pattern exists.

• BUT we never give up..

Introduction to Machine Learning Amo G. Tong 10

Clustering

• BUT we never give up..

• Partition based framework

• Hierarchical clustering framework

Introduction to Machine Learning Amo G. Tong 11

K-means Framework

• We have some data.

• We can define (a) the similarity between two instances and (b) the center of a set of instances.

• E.g. Euclidian space (real vector)

• Distance 𝒙𝟏 − 𝒙𝟐2

• similarity=1/distance

• Center of 𝑥1, … , 𝑥𝑛 : ഥ𝒙 =σ 𝒙𝑖

𝑛

Introduction to Machine Learning Amo G. Tong 12

K-means Framework

• We have some data.

• We can define (a) the similarity between two instances and (b) the center of a set of instances.

• Suppose there are 𝑘 clusters.

• Randomly select 𝑘 centers

• Repeat

• Assign each instance to the closest center. (now we have 𝑘 clusters)

• Recompute the center of each cluster.

• Until converge or other criteria meet

Introduction to Machine Learning Amo G. Tong 13

K-means Framework (Bishop)

• Example (Euclidian space)

Suppose k=2.Step 1: random pick two centers

Introduction to Machine Learning Amo G. Tong 14

K-means Framework (Bishop)

• Example (Euclidian space)

Suppose k=2.Step 1: random pick two centersStep 2: assign points to the closest center

Introduction to Machine Learning Amo G. Tong 15

K-means Framework (Bishop)

• Example (Euclidian space)

Suppose k=2.Step 1: random pick two centersStep 2: assign points to the closest centerStep 3: calculate the center of each cluster

Introduction to Machine Learning Amo G. Tong 16

K-means Framework (Bishop)

• Example (Euclidian space)

Suppose 𝑘 = 2.Step 1: random pick two centersStep 2: assign points to the closest centerStep 3: calculate the center of each clusterStep 4: assign points to the closest center

Repeat until converge.

• Example (Graph Segmentation)

• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.

• Informally, identify main elements in an image.

Introduction to Machine Learning Amo G. Tong 17

K-means Framework (Bishop)

Pixel and color.

• Example (Graph Segmentation)

• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.

• Informally, identify main elements in an image.

Introduction to Machine Learning Amo G. Tong 18

K-means Framework (Bishop)

• Example (Graph Segmentation)

• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.

• Informally, identify main elements in an image.

Introduction to Machine Learning Amo G. Tong 19

K-means Framework (Bishop)

• Example (Graph Segmentation)

• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.

• Informally, identify main elements in an image.

Introduction to Machine Learning Amo G. Tong 20

K-means Framework (Bishop)

• Example (Graph Segmentation)

• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.

• Informally, identify main elements in an image.

Introduction to Machine Learning Amo G. Tong 21

K-means Framework (Bishop)

Introduction to Machine Learning Amo G. Tong 22

K-means Framework

• Repeat

• Update the assignment.

• Update the means (centers).

• Until converge or other criteria meet

Introduction to Machine Learning Amo G. Tong 23

K-means Framework

• Repeat

• Update the assignment.

• Update the means (centers).

• Until converge or other criteria meet

• Given the assignment 𝐶, let 𝐶(𝑥) be the mean (center) of the cluster containing 𝑥. Consider the Euclidian distance.

• Will it converge? Yes!

• Consider a potential function 𝑓 = σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥))

• 𝑓 will never increase and 𝑓 is bounded => it will converge

Introduction to Machine Learning Amo G. Tong 24

K-means Framework

• Given the assignment 𝐶, let 𝐶(𝑥) be the means (center) of the cluster containing 𝑥. Consider the Euclidian distance.

• Repeat

• Update the assignment.

• Update the means (centers).

• Until converge or other criteria meet

• Updating the assignment will not increase 𝒇.

• Recalculating the means will not increase 𝒇.

• For a fixed cluster, which one can minizine the distance sum?

• Try Lagrange Multiplier Method (do it yourself).

𝑓 =

𝑥∈𝐷

dist(𝑥, 𝐶(𝑥))

𝑑𝑖𝑠𝑡 = 𝒙𝟏 − 𝒙𝟐2

Introduction to Machine Learning Amo G. Tong 25

K-means Framework

• Simple

• Intuitive, minimize σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥)) (implicitly)

• Not time consuming, O(tkn) (k: clusters, t: iterations, n: instance #)

Introduction to Machine Learning Amo G. Tong 26

K-means Framework

• Simple

• Intuitive, minimize σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥)) (implicitly)

• Not time consuming, O(tkn) (k: clusters, t: iterations, n: instance #)

• K-means may converge to local optimal

• How many clusters are there?

• Distance between clusters.

• How to define mean? What if the attributes are not real numbers

• Cannot handle noise

• Not suitable for non-convex patterns. (recall the pattern of knn)

Introduction to Machine Learning Amo G. Tong 27

K-means Framework

• Simple

• Intuitive, minimize σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥)) (implicitly)

• Not time consuming, O(tkn) (k: clusters, t: iterations, n: instance #)

• K-means may converge to local optima

• How many clusters are there?

• Distance between clusters.

• How to define mean? What if the attributes are not real numbers

• Cannot handle noise

• Not suitable for non-convex patterns. (recall the pattern of knn)

Introduction to Machine Learning Amo G. Tong 28

Cut-based Clustering

• Two intuitions behind a good clustering.

• (a) weaken the connection between objects in different clusters

• (b) strengthen the connection between objects within a cluster

Introduction to Machine Learning Amo G. Tong 29

Cut-based Clustering

• Two intuitions behind a good clustering.

• (a) weaken the connection between objects in different clusters

• (b) strengthen the connection between objects within a cluster

• Ground set 𝑈 = {𝑣1, … 𝑣𝑛}

• Similarity between two elements 𝑠𝑖𝑚(𝑣𝑖 , 𝑣𝑗)

• A partition 𝐶1, … , 𝐶𝑘 of 𝑈

• Inner-sim(𝐶𝑖) = σ𝑢,𝑣∈ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣)

• Inter-sim(𝐶𝑖)= σ𝑢∈ 𝐶𝑖, 𝑣∉ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣) (cut)

How to measure the goodness of a cluster?

Cost of a clustering 𝐶1, … , 𝐶𝑘

σInter−sim(𝐶𝑖)

Inner−sim(𝐶𝑖)

Introduction to Machine Learning Amo G. Tong 30

Cut-based Clustering

• Two intuitions behind a good clustering.

• Ground set 𝑈 = {𝑣1, … 𝑣𝑛}

• Similarity between two elements 𝑠𝑖𝑚(𝑣𝑖 , 𝑣𝑗)

• A partition 𝐶1, … , 𝐶𝑘 of 𝑈

• Inner-sim(𝐶𝑖) = σ𝑢,𝑣∈ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣)

• Inter-sim(𝐶𝑖)= σ𝑢∈ 𝐶𝑖, 𝑣∉ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣) (cut)

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

Optimal solution exists but it is hard to find.Enumerating?Polynomial time?

Introduction to Machine Learning Amo G. Tong 31

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Initialize 𝐶1, … , 𝐶𝑘 randomly.

• Repeat until converge

• Unlock all elements

• Repeat until all elements are locked.

• Randomly select one 𝐶𝑖• Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if any

• Move 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕 is maximally decreased.

• Lock 𝑣.

Introduction to Machine Learning Amo G. Tong 32

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Example. k=2

1

32

Cost= 0+ (3+1)/2 =2

ab

c

• Inner−cost(𝐶𝑖)=∞ if |𝐶𝑖|=1• Or you can do some smoothing

by assign a base similarity.

Introduction to Machine Learning Amo G. Tong 33

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Example. k=2

1

32

Cost= 0+ (3+1)/2 =2

ab

c

If move c, cost=(1+2)/3+ 0 =1

1

32

ab

c

Inner−cost(𝐶𝑖)=∞ if |𝐶𝑖|=1

Introduction to Machine Learning Amo G. Tong 34

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Example. k=2

1

32

Cost= 0+ (3+1)/2 =2

ab

c

If move c, cost=(1+2)/3+ 0 =1

If move b, cost=(3+2)/1+ 0 =5

1

32

ab

c

1

32

ab

c

Inner−cost(𝐶𝑖)=∞ if |𝐶𝑖|=1

Introduction to Machine Learning Amo G. Tong 35

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Heuristic algorithm.

• May not be optimal

• Is the solution good?

• Reasonable. Cost is iteratively decreased.

• Does it converge?

• Yes.

Introduction to Machine Learning Amo G. Tong 36

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Initialize 𝐶1, … , 𝐶𝑘 randomly.

• Repeat until converge (converge?)

• Unlock all elements

• Repeat until all elements are locked. (converge?)

• Randomly select one 𝐶𝑖• Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if any

• Move 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕 is maximally decreased.

• Lock 𝑣.

Introduction to Machine Learning Amo G. Tong 37

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Heuristic algorithm.

• May not optimal

• Is the solution good?

• Reasonable. Cost is iteratively decreased.

• Does it converge?

• Yes.

• Any other choices?

• Yes

Introduction to Machine Learning Amo G. Tong 38

Cut-based Clustering

• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)

• An algorithm

• Initialize 𝐶1, … , 𝐶𝑘 randomly.

• Repeat until converge

• Unlock all elements

• Repeat until all elements are locked.

• Randomly select one 𝐶𝑖• Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if any

• Move 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕 is maximally decreased.

• Lock 𝑣.

You may select the one that, after considered, can maximally decrease the cost

Introduction to Machine Learning Amo G. Tong 39

Cut-based Clustering

• Compare to k-means

• The number of clusters is known in advance.

• Need some initializations

• Iteratively improve the solution.

• Cut-based: consider both the inter and inner similarity

• K-means: only consider the inner similarity.

Introduction to Machine Learning Amo G. Tong 40

Agglomerative Clustering

• Idea: combine small clusters.

Introduction to Machine Learning Amo G. Tong 41

Agglomerative Clustering

• Idea: combine small clusters.

• Framework:

• Maintain a set of clusters

• Initially, each instance is one cluster

• Repeat

• Merge two closest clusters

• Until there is one cluster

• Key: how to define closeness of clusters?

Introduction to Machine Learning Amo G. Tong 42

Agglomerative Clustering

• Key: how to define closeness of clusters?

• First, define the closeness of each pair.

• The closeness of the clusters can be• The closest pair (single-link clustering)

• The farthest pair (complete-link clustering, diameter)

• Sum of all pairs? Average of all pairs.

• Ward’s method

• If you can define the distance within a cluster, find the pair of cluster that results minimum increase in in-cluster distance.

Introduction to Machine Learning Amo G. Tong 43

Agglomerative Clustering (Hastie)

• The result of agglomerative clustering hierarchy of clusters.

dendrogram

So what if we want k clusters?

Introduction to Machine Learning Amo G. Tong 44

Agglomerative Clustering

Detect

outliers.

Introduction to Machine Learning Amo G. Tong 45

Divisive Clustering

• Idea: split a large cluster into two

Introduction to Machine Learning Amo G. Tong 46

Divisive Clustering

• Idea: split a large cluster into two

• Framework:

• Maintain a set of clusters

• Initially, all instances form one cluster

• Repeat

• Split one cluster into two

• Until each cluster is a singleton.

• Key: Which cluster should we split? How to split it?

Introduction to Machine Learning Amo G. Tong 47

Divisive Clustering (Andrea)

Key: Which cluster should we split? How to split it?

Introduction to Machine Learning Amo G. Tong 48

Divisive Clustering

• Idea: split a large cluster into two

• Framework:

• Maintain a set of clusters

• Initially, all instances form one cluster

• Repeat

• Split one cluster into two

• Until each cluster is a singleton.

Which cluster should we split?

If we grow the entire dendrogram and your splitting rule is local, it does not matter.

Otherwise, you may select the one with the highest cost.

How to split it? (many choices)Equally partition it such that the cost is minimized.

DIANA.

Introduction to Machine Learning Amo G. Tong 49

Divisive Clustering

• Idea: split a large cluster into two

• Framework:

• Maintain a set of clusters

• Initially, all instances form one cluster

• Repeat

• Split one cluster into two

• Until each cluster is a singleton.

Which cluster should we split?

If we grow the entire dendrogram and your splitting rule is local, it does not matter.

Otherwise, you may select the one with the highest cost.

How to split it? (many choices)Equally partition it such that the cost is minimized.

DIANA

DIANA:To divide the selected cluster, the algorithm first looks for its most disparate observation (i.e., which has the largest average dissimilarity to the other observations of the selected cluster). This observation initiates the "splinter group". In subsequent steps, the algorithm reassigns observations that are closer to the "splinter group" than to the "old party". The result is a division of the selected cluster into two new clusters.

Introduction to Machine Learning Amo G. Tong 50

Hierarchal Clustering - Summary

• No need to specify the number of clusters in advance.

• Can be time consuming, time complexity of at least O(𝑛2), where n is the number of total objects

• Hierarchical structure stands for intuitions for some domains.

• But the interpretation is subjective.

Introduction to Machine Learning Amo G. Tong 51

Summary

• K-means

• Cut-based measurements

• Agglomerative clustering

• Divisive clustering

Spring 2019 Amo G. Tong 52

Equal-sized k-clustering

Cut-based k-clustering.

cost = σInter−cost(𝐶𝑖)

Inner−cost(𝐶𝑖)

Initialize 𝐶1, … , 𝐶𝑘 randomly.Repeat until converge (converge?)

Unlock all elementsRepeat until all elements are locked. (converge?)

Randomly select one 𝐶𝑖Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if anyMove 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕is maximally decreased.Lock 𝑣.

Given a set of 𝑘 ∗ 𝑚 elements, we want a equal-sized k-clustering. That is, each cluster has exactly 𝑚 elements.

Please describe a cut-based algorithm for such a purpose.

Hint: How to take account of the size of the clusters?