Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

40
Supervised Learning Recap Machine Learning

Transcript of Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Page 1: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Supervised Learning Recap

Machine Learning

Page 2: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Last Time

• Support Vector Machines• Kernel Methods

Page 3: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Today

• Review of Supervised Learning• Unsupervised Learning – (Soft) K-means clustering– Expectation Maximization– Spectral Clustering– Principle Components Analysis– Latent Semantic Analysis

Page 4: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Supervised Learning

• Linear Regression• Logistic Regression• Graphical Models– Hidden Markov Models

• Neural Networks• Support Vector Machines– Kernel Methods

Page 5: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Major concepts

• Gaussian, Multinomial, Bernoulli Distributions• Joint vs. Conditional Distributions• Marginalization• Maximum Likelihood• Risk Minimization• Gradient Descent• Feature Extraction, Kernel Methods

Page 6: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Some favorite distributions

• Bernoulli

• Multinomial

• Gaussian

Page 7: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Maximum Likelihood

• Identify the parameter values that yield the maximum likelihood of generating the observed data.

• Take the partial derivative of the likelihood function• Set to zero• Solve

• NB: maximum likelihood parameters are the same as maximum log likelihood parameters

Page 8: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Maximum Log Likelihood

• Why do we like the log function?• It turns products (difficult to differentiate) and

turns them into sums (easy to differentiate)

• log(xy) = log(x) + log(y)• log(xc) = c log(x)•

Page 9: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Risk Minimization

• Pick a loss function– Squared loss– Linear loss– Perceptron (classification) loss

• Identify the parameters that minimize the loss function.– Take the partial derivative of the loss function– Set to zero– Solve

Page 10: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Frequentists v. Bayesians

• Point estimates vs. Posteriors• Risk Minimization vs. Maximum Likelihood• L2-Regularization– Frequentists: Add a constraint on the size of the

weight vector– Bayesians: Introduce a zero-mean prior on the

weight vector– Result is the same!

Page 11: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

L2-Regularization

• Frequentists:– Introduce a cost on the size of the weights

• Bayesians:– Introduce a prior on the weights

Page 12: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Types of Classifiers

• Generative Models– Highest resource requirements. – Need to approximate the joint probability

• Discriminative Models– Moderate resource requirements. – Typically fewer parameters to approximate than generative models

• Discriminant Functions– Can be trained probabilistically, but the output does not include

confidence information

Page 13: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Linear Regression

• Fit a line to a set of points

Page 14: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Linear Regression

• Extension to higher dimensions– Polynomial fitting

– Arbitrary function fitting• Wavelets• Radial basis functions• Classifier output

Page 15: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Logistic Regression

• Fit gaussians to data for each class• The decision boundary is where the PDFs cross

• No “closed form” solution to the gradient.• Gradient Descent

Page 16: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Graphical Models

• General way to describe the dependence relationships between variables.

• Junction Tree Algorithm allows us to efficiently calculate marginals over any variable.

Page 17: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Junction Tree Algorithm

• Moralization– “Marry the parents”– Make undirected

• Triangulation– Remove cycles >4

• Junction Tree Construction– Identify separators such that the running intersection

property holds• Introduction of Evidence– Pass slices around the junction tree to generate marginals

Page 18: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Hidden Markov Models

• Sequential Modeling– Generative Model

• Relationship between observations and state (class) sequences

Page 19: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Perceptron

• Step function used for squashing.• Classifier as Neuron metaphor.

Page 20: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Perceptron Loss

• Classification Error vs. Sigmoid Error– Loss is only calculated on Mistakes

Perceptrons usestrictly classificationerror

Page 21: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Neural Networks

• Interconnected Layers of Perceptrons or Logistic Regression “neurons”

Page 22: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Neural Networks

• There are many possible configurations of neural networks– Vary the number of layers– Size of layers

Page 23: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Support Vector Machines

• Maximum Margin Classification Small Margin

Large Margin

Page 24: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Support Vector Machines

• Optimization Function

• Decision Function

Page 25: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

25

Visualization of Support Vectors

Page 26: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Questions?

• Now would be a good time to ask questions about Supervised Techniques.

Page 27: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Clustering

• Identify discrete groups of similar data points• Data points are unlabeled

Page 28: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Recall K-Means

• Algorithm– Select K – the desired number of clusters– Initialize K cluster centroids– For each point in the data set, assign it to the cluster

with the closest centroid

– Update the centroid based on the points assigned to each cluster

– If any data point has changed clusters, repeat

Page 29: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

k-means output

Page 30: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Soft K-means

• In k-means, we force every data point to exist in exactly one cluster.

• This constraint can be relaxed.

Minimizes the entropy of cluster assignment

Page 31: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Soft k-means example

Page 32: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Soft k-means

• We still define a cluster by a centroid, but we calculate the centroid as the weighted mean of all the data points

• Convergence is based on a stopping threshold rather than changed assignments

Page 33: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Gaussian Mixture Models

• Rather than identifying clusters by “nearest” centroids

• Fit a Set of k Gaussians to the data.

Page 34: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

GMM example

Page 35: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Gaussian Mixture Models

• Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,

Page 36: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Graphical Modelswith unobserved variables

• What if you have variables in a Graphical model that are never observed?– Latent Variables

• Training latent variable models is an unsupervised learning application

laughing

amused

sweating

uncomfortable

Page 37: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Latent Variable HMMs

• We can cluster sequences using an HMM with unobserved state variables

• We will train the latent variable models using Expectation Maximization

Page 38: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Expectation Maximization

• Both the training of GMMs and Gaussian Models with latent variables are accomplished using Expectation Maximization– Step 1: Expectation (E-step)• Evaluate the “responsibilities” of each cluster with the

current parameters

– Step 2: Maximization (M-step)• Re-estimate parameters using the existing

“responsibilities”

• Related to k-means

Page 39: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Questions

• One more time for questions on supervised learning…

Page 40: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods.

Next Time

• Gaussian Mixture Models (GMMs)• Expectation Maximization