Machine Learning:- Supervised Algorithms -
Realized by :
AKHIAT Yassine AKACHAR El Yazid
Faculté des Sciences Dhar El Mahraz-Fès
Année Universitaire : 2014/2015
Master SIRM
Outline
1. Introduction2. Supervised Algorithms 3. Some Real life
applications4. Naïve Bayes Classifier5. Implementation 6. Conclusion
Introduction
Machine Learning
from dictionary.com
“The ability of a machine to improve its
performance based on previous results.”
Arthur Samuel (1959) Field of study that gives computers the
ability to learn without being explicitly programmed
Introduction
Machine learning algorithms are organized into taxonomy, based on the desired outcome of the algorithm. Common algorithm types include:
Supervised Algorithms Unsupervised Algorithms Reinforcement Algorithms ETC …
Algorithms Types
Supervised Algorithms
Supervised Algorithms is the search for algorithms that reason from externally supplied instances to produce general hypotheses, which then make predictions about future instances.
In other words :
The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features.
Definition
Motivation
Supervised Algorithms
the raison why Supervised are appeared :
because in each domain there is a lot of information has generated in seconds , So why we don't exploit those information and this experience to make a good decision in future
Supervised Algorithms
Data: A set of data records (also called examples, instances or cases) described by k attributes: A1, A2, … Ak. a class: Each example is labelled with a pre-
defined class. Goal: To learn a classification model from the
data that can be used to predict the classes of new (future, or test) cases/instances.
Approach
Supervised Algorithms
Supervised Algorithms Process
Learning (training): Learn a model using the training data
Testing: Test the model using unseen test data to assess the model accuracy
,cases test ofnumber Total
tionsclassificacorrect ofNumber Accuracy
Supervised Algorithms
Example : Regression
Age prediction Regression :Predict
Continuous valued output
(Age)
Supervised Algorithms
Example: Classification:
Classification:Predict discreet
valued output(0 or 1)
Boolean functions AND
Supervised Algorithms
Classification Algorithms
Neural Networks Decision Tree K- Nearest neighbors Naïve Bayes ETC …
Supervised Algorithms
Decision Tree
leaves represent classifications and branches represent tests on features that lead to those classifications
x1
x2?
?
?
?
X1>1
X2>2
YES
YES
NO
NO
1
2
Supervised Algorithms
K- Nearest neighborsFind the k nearest neighbors of the test example , and infer its class using their known class. E.g. K=3
x1
x2 ?
?
?
??
Some Real life applications
Systems Biology :Gene expression microarray data
Face detection :Signature recognition
Medicine : Predict if a patient has heart ischemia by a spectral analysis of his/her ECG
Recommended Systems
Text categorization : Spam filter
Some Real life applications
Microarray data
Separate malignant from healthy tissues based on the mRNA expression profile of the tissue.
Some Real life applications
Text categorization
Categorize text documents into predefined categories for example, categorize E-mail as “Spam” or “NotSpam”
Naïve Bayes
Named after
Thomas Bayes in
1876, who
proposed the
Bayes Theorem.
Definition
Naïve Bayesian Classification
Bayesian Classification
What is it ?
The Bayesian classifier is based on Bayes’ Theorem
with independence assumptions between predictors.
Easy to build, with no complicated iterative
parameter estimation which makes it particularly
useful for very large datasets
Bayesian Classification
Bayes Theorem
Bayes Theorem provides a way of calculating the
posterior probability, P(C|X), from P(X) ,and P(X|C)
P(C|X) is the posterior probability
of class given predictor (attribute)
P(X|C) is the likelihood which is
the probability of predictor given
class
P(X) is the prior probability of
predictor
Bayesian Classification
Classify a new Instance
(Outlook=sunny, Temp=cool, Humidity=high, Wind=strong)
How to classify This new Instance ??
Bayesian Classifier
Frequency Table
Outlook Play=Yes
Play=No
Sunny 2/9 3/5Overcast 4/9 0/5
Rain 3/9 2/5
Temperature
Play=Yes Play=No
Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5
Humidity Play=Yes
Play=No
High 3/9 4/5Normal 6/9 1/5
Wind Play=Yes Play=No
Strong 3/9 3/5Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
Bayesian Classification
Example So lets Classify This new instance :
Likelihood of Yes
L=P(Outl=sunny|Yes)*P(Tem=Cool|Yes)*P(Hum=high|Yes)*P(Win=Strong|Yes)*P(Yes)L=2/9 * 4/9 * 6/9 * 3/9 * 9/14 =0,0053
Likelihood of No
L=P(Outl=sunny|No)*P(Temp=Cool|No)*P(Hum=high|No)*P(Win=Strong|No)*P(No)L=2/9 * 4/9 * 6/9 * 3/9 * 9/14 =0,0206
Outlook Temperature
Humidity Wind Play Tennis
Sunny Cool High Strong ??
Example
Bayesian Classification
Now we normalize :
P(Yes)= 0,0053 / ( 0,0053+0,0206 ) P(No)= 0,0206 / ( 0,0053+0,0206 )
Then :
P(Yes) =0,20 P(No) =0,80
So the predict class is
Outlook Temperature
Humidity Wind Play Tennis
Sunny Cool High Strong No
When an attribute value (Outlook=Overcast)
doesn’t occur with every class value (play tennis
=no)
Add 1 to all the counts
Bayesian Classification
The Zero-Frequency Problem
Bayesian Classification
Numerical Attributes
Numerical variables need to be transformed to their categorical before constructing their frequency tables
The other option we have is using the distribution of the numerical variable to have a good guess of the frequency
For example, one common practice is to assume normal distributions for numerical variables
Bayesian Classification
Normal distribution
The probability density function for the normal distribution is defined by two parameters (mean and standard deviation )
Bayesian Classification
Example of numerical Attributes
Yes
86 96 80 65 70 80 70 90 75
No 85 90 70 95 91
79,1 10,2
86,2 9,7
Humidity Mean StDev
Bayesian Classification
Uses Of Bayes Classification
Text Classification
Spam Filtering
Hybrid Recommender System
Online Application
Bayesian Classification
Advantages
Easy to implement
Requires a small amount of training data to estimate the parameters
Good results obtained in most of the cases
Bayesian Classification
Disadvantages
Assumption: class conditional independence, therefore loss of accuracy
Practically, dependencies exist among variables
E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
Dependencies among these cannot be modelled by Naïve Bayesian Classifier
Application
Spam filtering Spam filtering is the best known use of Naive Bayesian text
classification. It makes use of a naive Bayes classifier to identify spam e-mail.
Bayesian spam filtering has become a popular mechanism to distinguish illegitimate spam email from legitimate email
Many modern mail clients implement Bayesian spam filtering. Users can also install separate email filtering programs.
DSPAM, SpamAssassin, SpamBayes, ASSP,
Rappel
Naïve Bayes
The Bayesian classifier is based on Bayes’ Theorem
with independence assumptions between predictors.
Easy to build, with no complicated iterative
parameter estimation which makes it particularly
useful for very large datasets
Example
Naïve Bayes algorithms
doc words class
training D1 SIRM master FSDM A
D2 SIRM master A
D3 master SIRM A
D4 SIRM recherche FSDM B
test D5 SIRM SIRM SIRM master recherche FSDM ???
P(A)=(Nc/Nd)=3/4 , P(B)=(Nc/Nd)=1/4 P(SIRM|A)=(3+1)/(7+4)=4/11, P(master|A)=(3+1)/(7+4)=4/11P(recherche|A)=(0+1)/(7+4)=1/11, P(FSDM|A)=(1+1)/(7+4)=2/11
Example
Naïve Bayes algorithms
doc words class
training D1 SIRM master FSDM A
D2 SIRM master A
D3 master SIRM A
D4 SIRM recherche FSDM B
test D5 SIRM SIRM SIRM master recherche FSDM ???
P(A)=(Nc/Nd)=3/4 , P(B)=(Nc/Nd)=1/4 P(SIRM|B)=(1+1)/(3+4)=2/7, P(master|B)=(0+1)/(3+4)=1/7P(recherche|B)=(1+1)/(3+4)=2/7, P(FSDM|B)=(1+1)/(3+4)=2/7
Example
P(A|D5)=3/4 * (4/11)4 * 1/11 * 2/11 =0,00022
P(B|D5)=1/4 * (2/7)5 * 1/7 =0,000068 Now we normalize :
P(A|D5)= 0,00022 / ( 0,000068+0,00022 ) P(B|D5)= 0,000068 / ( 0,000068+0,00022 )
Then :
P(A|D5) =0,76 P(A|D5) =0,24
So the predict class is
Test D5 SIRM SIRM SIRM master recherche FSDM A
Conclusion
The naive Bayes model is tremendously appealing because of
its simplicity, elegance, and robustness.
It is one of the oldest formal classification algorithms, and yet
even in its simplest form it is often surprisingly effective.
A large number of modifications have been introduced, by the
statistical, data mining, machine learning, and pattern
recognition communities, in an attempt to make it more flexible.
Top Related