Machine learning

56
Machine Learning Andrea Iacono https://github.com/andreaiacono/MachineLearning

description

An introduction to several techniques of machine learning.

Transcript of Machine learning

Page 1: Machine learning

MachineLearning

Andrea Iacono https://github.com/andreaiacono/MachineLearning

Page 2: Machine learning

Machine Learning: Intro

[Wikipedia]: a branch of artificial intelligence that allows the construction and the study of

systems that can learn from data

What is Machine Learning?

Page 3: Machine learning

- Regression analysis- Similarity and metric learning- Decision tree learning- Association rule learning- Artificial neural networks- Genetic programming- Support vector machines (classification and regression analysis)- Clustering- Bayesian networks

Machine Learning: Intro

Some approaches:

Page 4: Machine learning

Machine Learning: Intro

Supervised learningvs

Unsupervised learning

Machine learningvs

Data mining

Page 5: Machine learning

Machine Learning: Regression analysis

Regression Analysis

A statistical technique for estimating the relationships among a dependent variable and independent variables

Page 6: Machine learning

Machine Learning: Regression analysis

Prediction of house pricesSize (x) Price (y)

0.80 70

0.90 83

1.00 74

1.10 93

1.40 89

1.40 58

1.50 85

1.60 114

1.80 95

2.00 100

2.40 138

2.50 111

2.70 124

3.20 172

3.50 172

Page 7: Machine learning

Machine Learning: Regression analysis

Prediction of house prices

Hypothesis:

hθ(x )=θ0+θ1 x

Page 8: Machine learning

Machine Learning: Regression analysis

Prediction of house prices

Hypothesis:

hθ(x )=θ0+θ1 x J (θ0,θ1)=12m

∑i=1

m

(hθ(x(i))− y(i ))2

Cost function for linear regression:

Page 9: Machine learning

Machine Learning: Regression analysis

Prediction of house prices

Hypothesis:

hθ(x )=θ0+θ1 x J (θ0,θ1)=12m

∑i=1

m

(hθ(x(i))− y(i ))2

Gradient Descent

repeat until convergence :

θ0=θ0−α1m∑i=1

m

(hθ( x(i ))− y(i )

)

θ1=θ1−α1m∑

i=1

m

[ (hθ(x(i))− y(i )) x(i) ]

Cost function for linear regression:

Page 10: Machine learning

Machine Learning: Regression analysis

Prediction of house prices

Iterative minimization of cost function with gradient descent

Page 11: Machine learning

Machine Learning: Regression analysis

Hands on

Page 12: Machine learning

Machine Learning: Regression analysis

Regression analysis

- one / multiple variables - linear / higher order curves

- several optimization algorithms - linear regression - logistic regression - simulated annealing - ...

Page 13: Machine learning

Machine Learning: Regression analysis

Overfitting vs underfitting

Page 14: Machine learning

Machine Learning: Similarity and metric learning

Similarity and metric learning

- concept of distance

Page 15: Machine learning

Euclidean distance

euclidean distance (p , q )=√∑i =1

n

(p i −q i )2

Machine Learning: Similarity and metric learning

Page 16: Machine learning

manhattan distance (p , q )=∑i=1

n

∣(p i−q i )∣

Machine Learning: Similarity and metric learning

Manhattan distance

Page 17: Machine learning

Pearson ' s correlation (p , q )=∑i =1

n

(p i q i )−∑i =1

n

p i ∑i =1

n

q i

n

√(∑i =1

n

p i2−

(∑i =1

n

pi )2

n)(∑

i =1

n

q i2−

(∑i =1

n

q i )2

n)

Machine Learning: Similarity and metric learning

Pearson's correlation

Page 18: Machine learning

Searches a large group of users for finding a small subset that have tastes like yours. Based on what this subset likes or dislikes the system can recommend you other items.

Two main approaches: - User based filtering - Item based filtering

Machine Learning: Similarity and metric learning

Collaborative filtering

Page 19: Machine learning

Machine Learning: Similarity and metric learning

User based filtering

- based on ratings given to the items, we can measure the distance among users

- we can recommend to the user the items that have the highest ratings among the closest users

Page 20: Machine learning

Hands on

Machine Learning: Similarity and metric learning

Page 21: Machine learning

Machine Learning: Similarity and metric learning

Is user based filtering good for- scalability?

- sparse data?- quickly changing data?

Page 22: Machine learning

Machine Learning: Similarity and metric learning

Is user based filtering good for- scalability?

- sparse data?- quickly changing data?

No, it's better to use item based filtering

Page 23: Machine learning

Machine Learning: Similarity and metric learning

Euclidean distance for item based filtering:nothing has changed!

- based on ratings got from the users, we can measure the distance among items

- we can recommend an item to a user, getting the items that are closer to the highest rated by the user

Page 24: Machine learning

Hands on

Machine Learning: Similarity and metric learning

Page 25: Machine learning

P (A∣B )=P (B∣A)P (A)

P (B )

Machine Learning: Bayes' classifier

Example: given a company where 70% of developers use Java and 30% use C++, and knowing that half of the Java developers always use enhanced for loop, if you look at the snippet:

which is the probability that the developer who wrote it uses Java?

for (int j=0; j<100; j++) {t = tests[j];

}

P (A∣B )=P (B∣A)P (A)

P (B )

Bayes' theorem

Page 26: Machine learning

P (A∣B )=P (B∣A)P (A)

P (B )

Machine Learning: Bayes' classifier

Example: given a company where 70% of developers use Java and 30% use C++, and knowing that half of the Java developers always use enhanced for loop, if you look at the snippet:

which is the probability that the developer who wrote it uses Java?

for (int j=0; j<100; j++) {t = tests[j];

}

P (A∣B )=P (B∣A)P (A)

P (B )

Bayes' theorem

Hint:A = developer uses JavaB = developer writes old for loops

Page 27: Machine learning

P (A∣B )=P (B∣A)P (A)

P (B )

Machine Learning: Bayes' classifier

Example: given a company where 70% of developers use Java and 30% use C++, and knowing that half of the Java developers always use enhanced for loop, if you look at the snippet:

which is the probability that the developer who wrote it uses Java?

for (int j=0; j<100; j++) {t = tests[j];

}

P (A∣B )=P (B∣A)P (A)

P (B )

Bayes' theorem

Solution:A = developer uses JavaB = developer writes old for loops

P (A∣B )=P (B∣A)P (A)

P (B )=

0.5⋅0.70.65

=0.54

P(A) = prob. that a developer uses Java = 0.7P(B) = prob. that any developer uses old for loop = 0.3 + 0.7*0.5 = 0.65P(B|A) = prob. that a Java developer uses old for loop = 0.5

Page 28: Machine learning

Machine Learning: Bayes' classifier

Naive Bayes' classifier

- supervised learning- trained on a set of known classes- computes probabilities of elements to be in a class- smoothing required

P c (w 1 , .... , w n)=∏i =1

n

P (c∣w i )

∏i =1

n

P (c∣w i )+∏i=1

n

(1−P (c∣w i ))

Page 29: Machine learning

Machine Learning: Bayes' classifier

Naive Bayes' classifier

Example

- we want a classifier for Twitter messages- define a set of classes: {art, tech, home, events,.. }- trains the classifier with a set of alreay classified tweets- when a new tweet arrives, the classifier will (hopefully) tell us which class it belongs to

Page 30: Machine learning

Machine Learning: Bayes' classifier

Hands on

Page 31: Machine learning

Machine Learning: Bayes' classifier

Sentiment analysis

- define two classes: { +, - }- define a set of words: { like, enjoy, hate, bore, fun, …}- train a NBC with a set of known +/- comments - let NBC classify any new comment to know if +/-

- performance is related to quality of training set

Page 32: Machine learning

Machine Learning: Clustering

- Unsupervised learning- Different algorithms: - Hierarchical clustering - K-Means clustering - ...

Clustering

Common use cases: - navigation habits - online commerce - social/political attitudes - ...

Page 33: Machine learning

Machine Learning: Clustering

K-Means aims at identifying cluster centroids, such that an item belonging to a cluster X, is closer to the centroid of cluster X than to the centroid of any other cluster.

K-Means clustering

Page 34: Machine learning

Machine Learning: Clustering

The algorithm requires a number of clusters to start, in this case 3. The centroids are placed in the item space, typically in random locations.

K-Means clustering

Page 35: Machine learning

Machine Learning: Clustering

The algorithm will then assign to each centroid all items that are closer to it than to any other centroid.

K-Means clustering

Page 36: Machine learning

Machine Learning: Clustering

The centroids are then moved to the center of mass of the items in the clusters.

K-Means clustering

Page 37: Machine learning

Machine Learning: Clustering

A new iteration occurs, taking into account the new centroid positions.

K-Means clustering

Page 38: Machine learning

Machine Learning: Clustering

The centroids are again moved to the center of mass of the items in the clusters.

K-Means clustering

Page 39: Machine learning

Machine Learning: Clustering

Another iteration occurs, taking into account the new centroid positions.

K-Means clustering

Page 40: Machine learning

Machine Learning: Clustering

The centroids are again moved to the center of mass of the items in the clusters.

K-Means clustering

Page 41: Machine learning

Machine Learning: Clustering

Another iteration occurs, taking into account the new centroid positions. Note that this time the cluster membership did not change. The cluster centers will not move anymore.

K-Means clustering

Page 42: Machine learning

Machine Learning: Clustering

The solution is found.

K-Means clustering

Page 43: Machine learning

Machine Learning: Clustering

Hands on

Page 44: Machine learning

Machine Learning: Neural networks

A logical calculus of the ideas immanent in nervous activity

by McCulloch and Pitts in 1943

Neural networks

Page 45: Machine learning

Machine Learning: Neural networks

Neural networks

Feedforward Perceptron

Page 46: Machine learning

Machine Learning: Neural networks

Neural networks

Logic operators with neural networks:

Threshold = 0

X0 X1 X2 Σ Result

-10 0 0 -10 0

-10 0 20 10 1

-10 20 0 10 1

-10 20 20 30 1

OR operator

Page 47: Machine learning

Machine Learning: Neural networks

Neural networks

Threshold = 0

X0 X1 X2 Σ Result

-30 0 0

-30 0 20

-30 20 0

-30 20 20

which operator?

Logic operators with neural networks:

Page 48: Machine learning

Machine Learning: Neural networks

Neural networks

Threshold = 0

X0 X1 X2 Σ Result

-30 0 0 -30 0

-30 0 20 -10 0

-30 20 0 -10 0

-30 20 20 10 1

AND operator

Logic operators with neural networks:

Page 49: Machine learning

Machine Learning: Neural networks

Hands on

Page 50: Machine learning

Machine Learning: Neural networks

Neural networks

Backpropagation

Phase 1: Propagation - Forward propagation of a training pattern's input through the neural network in order to generate the propagation's output activations - Backward propagation of the propagation's output activations through the neural network using the training pattern target in order to generate the deltas of all output and hidden neurons

Phase 2: Weight update - Multiply its output delta and input activation to get the gradient of the weight - Bring the weight in the opposite direction of the gradient by subtracting a ratio of it from the weight

Page 51: Machine learning

Machine Learning: Neural networks

Neural networks

Multilayer perceptrons

Page 52: Machine learning

Machine Learning: Neural networks

Hands on

Page 53: Machine learning

Machine Learning: Genetic algorithms

Genetic algorithms

GA is a programming technique that mimics biological evolution as a problem-solving strategy

Steps- maps the variables of the problem into a sequence of bits, a chromosome

Chromosome

- creates a random population of chromosomes- let evolve the population using evolution laws: - the higher the fitness, the higher the chance of breeding - crossover of chromosomes - mutation in chromosomes- if otpimal solution is found or after n steps the process is stopped

Page 54: Machine learning

Machine Learning: Genetic algorithms

Genetic algorithms

Mutation

Crossover

Page 55: Machine learning

Hands on

Machine Learning: Genetic algorithms

Page 56: Machine learning

Thanks!

Machine Learning

The code is available on:https://github.com/andreaiacono/MachineLearning