Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine...

64
Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk of slides kindly provided by Prof. Sandra Avila) Institute of Computing (IC/Unicamp) MC886/MO444 REC D reasoning for complex data

Transcript of Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine...

Page 1: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Machine Learning andPattern Recognition

A High Level Overview

Prof. Anderson Rocha(Main bulk of slides kindly provided by Prof. Sandra Avila)

Institute of Computing (IC/Unicamp)

MC886/MO444

REC Dreasoning for complex data

Page 2: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Machine Learning Technique

Input

Target

Output Target?

Supervised Learning

Page 3: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Machine Learning Technique

Input

Output ?Target

Unsupervised Learning

Page 4: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Unsupervised Learning

Machine Learning Technique

Input Output

The goal of unsupervised learning is to find patterns in the data, and build new and useful representations of it.

Page 5: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Today’s Agenda

● Clustering

○ k-Means Algorithm

○ Optimization Objective

○ Random Initialization

○ Choosing the Number of Clusters

○ k-Means Variations

○ Evaluation Performance Clustering

Page 6: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Clusteringk-Means Algorithm

Page 7: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Did anyone say pizza?

https://www.youtube.com/watch?v=IpGxLWOIZy4

Page 8: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Did anyone say pizza?

Page 9: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means Clustering

Centroids

Clusters

Page 10: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means: Image Segmentation

K = 10 K = 3 K = 2Original

Credit: Christopher Bishop

Page 11: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means: Image Segmentation

K = 10 K = 3 K = 2Original

Credit: Christopher Bishop

ColorQuantization

Page 12: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means Algorithm

1. Define the k centroids.

2. Find the closest centroid & update cluster assignments.

3. Move the centroids to the center of their clusters.

4. Repeat steps 2 and 3 until the centroid stop moving a lot at each iteration.

Page 13: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means Algorithm (Lloyd's Algorithm)

Input:

➔ K (number of clusters)

➔ Training set

Page 14: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means Algorithm (Lloyd's Algorithm)

Randomly initialize K cluster centroids ∈ ℝn

repeat {for i = 1 to m

:= index (from 1 to K) of cluster centroid closest tofor k = 1 to K

:= mean of points assigned to cluster k}

Page 15: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means Algorithm (Lloyd's Algorithm)

Randomly initialize K cluster centroids ∈ ℝn

repeat {for i = 1 to m

:= index (from 1 to K) of cluster centroid closest tofor k = 1 to K

:= mean of points assigned to cluster k} Move (Update) centroid step

Cluster assignment step

Page 16: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means Algorithm (Complexity)

● Relatively efficient: O(Kmnt)○ K = #clusters○ m = #vectors (samples)○ n = #dimension of vectors○ t = #iterations○ n ≪ m

Page 17: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Clustering Optimization Objective

Page 18: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means Optimization Objective= index of cluster (from 1 to K) to which example is currently assigned

= cluster centroid k

Page 19: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means Optimization Objective= index of cluster (from 1 to K) to which example is currently assigned

= cluster centroid k

= cluster centroid of cluster to which example has been assigned

Optimization objective:

Page 20: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means Optimization Objective

Randomly initialize K cluster centroids ∈ ℝn

repeat {for i = 1 to m

:= index (from 1 to K) of cluster centroid closest tofor k = 1 to K

:= mean of points assigned to cluster k}

Page 21: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Clustering Initialization

Page 22: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Random Initialization

Assign each object to a random cluster &

Computes the initial centroid of each cluster.

Page 23: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Should have K < m.

Randomly pick K trainingexamples.

Set equal to theseK examples.

Random Initialization (Forgy, 1965)

Page 24: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk
Page 25: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Random Initialization

for i = 1 to 100 {Randomly initialize k-Means.

Run k-means. Get .Compute cost function J.

}

Pick clustering that gave lowest cost J( ).

Page 26: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Can we do better?

Page 27: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

● One idea for initializing k-Means is to use a farthest-first traversal on the data set, to pick K points that are far away from each other.

Can we do better?

Page 28: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk
Page 29: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

● One idea for initializing k-Means is to use a farthest-first traversal on the data set, to pick K points that are far away from each other.

● However, this is too sensitive to outliers.

Can we do better?

Page 30: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means++ (Arthur & Vassilvitski, 2007)

● It works similarly to the “farthest” heuristic.

● Choose each point at random, with probability proportional to its squared distance from the centers chosen already.

Page 31: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means++ (Arthur & Vassilvitski, 2007)

● It works similarly to the “farthest” heuristic.

● Choose each point at random, with probability proportional to its squared distance from the centers chosen already.

scikit-learn (default)

Page 32: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

ClusteringChoosing the number of clusters

Page 33: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

What is the right value of K?

Page 34: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

What is the right value of K?

Page 35: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

What is the right value of K?

Page 36: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Elbow Method

1 2 3 4 5 67 8 9 10Number of clusters (K)

Cos

t Fun

ctio

n J

Page 37: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Elbow Method

1 2 3 4 5 67 8 9 10Number of clusters (K)

Cos

t Fun

ctio

n J

Page 38: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Elbow Method

Number of clusters (K)

Cos

t Fun

ctio

n J

1 2 3 4 5 67 8 9 10

Page 39: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Elbow Method

Number of clusters (K)

Cos

t Fun

ctio

n J

1 2 3 4 5 67 8 9 10

Page 40: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Elbow Method

Number of clusters (K)

Cos

t Fun

ctio

n J

1 2 3 4 5 67 8 9 10

Page 41: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Elbow Method

Number of clusters (K)

Cos

t Fun

ctio

n J

1 2 3 4 5 67 8 9 10

Page 42: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Elbow Method

Number of clusters (K)

Cos

t Fun

ctio

n J

1 2 3 4 5 67 8 9 10

Page 43: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Elbow Method

1 2 3 4 5 67 8 9 10Number of clusters (K)

Cos

t Fun

ctio

n J

Page 44: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Elbow Method

1 2 3 4 5 67 8 9 10Number of clusters (K)

Cos

t Fun

ctio

n J

“Elbow”

Page 45: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Elbow Method

1 2 3 4 5 67 8 9 10Number of clusters (K)

Cos

t Fun

ctio

n J

Page 46: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Elbow Method

1 2 3 4 5 67 8 9 10Number of clusters (K)

Cos

t Fun

ctio

n J

Q: You find that cost function J is much higher for k = 5 than for k =3. What can you conclude?

Page 47: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means Variations

Page 48: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Mini-batch k-Means

● Uses mini-batches to reduce the computation time, while still attempting to optimize the same objective function.

● Converges faster than k-Means, but the quality of the results is reduced.

Page 49: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Medians Clustering

● Instead of calculating the mean for each cluster to determine its centroid, one instead calculates the median.

● Minimizing error over all clusters with respect to the 1-normdistance metric, as opposed to the square of the 2-normdistance metric (which k-Means does).

Page 50: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Medoids Clustering

● Instead of calculating the mean for each cluster to determine its centroid, one instead calculates the medoid.

● Minimizing error over all clusters with respect to the 1-normdistance metric.

● In contrast to the k-Means, k-Medoids chooses data points as centroids.

Page 51: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means (top) vs k-Medoids (bottom)

Credit: https://commons.wikimedia.org/wiki/File:K-means_versus_k-medoids.png

Page 52: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

k-Means (left) vs k-Medoids (right)

Credit: https://commons.wikimedia.org/wiki/File:K-means_versus_k-medoids.png

Page 53: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Fuzzy Clustering (Soft Clustering)

● Each data point can belong to more than one cluster.

Page 54: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Hierarchical Clustering

● Agglomerative (“bottom up”): each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

● Divisive (“top down”) :all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Page 55: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

DBSCAN Clustering

● Density-Based Spatial Clustering of Applications with Noise

● Given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions.

Page 56: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Clustering Performance Evaluation

Page 57: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Clustering Evaluation

● Evaluating the performance of a clustering algorithm is not as trivial as counting the number of errors or the precision and recall of a supervised classification algorithm.

Page 58: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Clustering Evaluation

● Evaluating the performance of a clustering algorithm is not as trivial as counting the number of errors or the precision and recall of a supervised classification algorithm.

● Adjusted Rand index● Mutual Information based scores● Homogeneity, completeness and V-measure● Silhouette Coefficient

Page 59: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Silhouette Coefficient

● The silhouette value is a measure of how similar a sample is to its own cluster (cohesion) compared to other clusters (separation).

● The silhouette ranges from −1 to +1. ○ High value = the clustering configuration is appropriate.○ Low value = the clustering configuration may have too many or

too few clusters.

Page 60: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Silhouette Coefficient

● The Silhouette Coefficient is defined for each sample and is composed of two scores:○ a: The mean distance between a sample and all other points

in the same cluster.○ b: The mean distance between a sample and all other points

in the next nearest cluster.

Page 61: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Silhouette Coefficient

● The Silhouette Coefficient s for a single sample is given as:

● The score is bounded between -1 for incorrect clustering and +1 for highly dense clustering (a ≪ b). Scores around zero indicate overlapping clusters.

Page 62: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

http://scikit-learn.org/stable/modules/clustering.html#clustering

Page 63: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

https://www.naftaliharris.com/blog/visualizing-k-means-clustering/

Page 64: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2… · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

References

Machine Learning Books

● Pattern Recognition and Machine Learning, Chap. 9 “Mixture Models and EM”

● Pattern Classification, Chap. 10 “Unsupervised Learning and Clustering”

Machine Learning Courses

● https://www.coursera.org/learn/machine-learning, Week 8