Missing Values
-
Upload
chanpreet-singh -
Category
Documents
-
view
212 -
download
0
description
Transcript of Missing Values
Missing values
Data
numeric
Mean Median
Categorical
Mode
Less outliersLarge dataset
Missing value through prediction Missing variable Associated
variable Type of technique Remarks Assumptions
Categorical Categorical • Decision tree • Naïve Bayesian
• Decesion tree need no assumption
• Naïve bayes assume independent variables
Categorical Numeric • Logistic regression
• K-NN classifier
• K-NN CLASSIFIER need no assumption
• Regression assumption of normality, homoscedasticity etc
Numeric Numeric • Regression model
• Clustering
• Clustering need no assumption
Regression assumption of normality, homoscedasticity etc
Numeric Categorical • Clustering • No assumption
Categorical Both • Decision tree • Multinomial
regression
• No assumption for decision tree
• Regression assumtpions
Numeric Both • Clustering • No assumption
K-NN Classifier
3-NN classifier
K-NN Classifier
K-NN Classifier
K-NN Classifier
If k is too small, sensitive to noise points
If k is too large, neighborhood may include points from other classes
K-NN Classifier
Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes
Knn classifier is a lazy learner because It does not build models explicitly
Testing with different k
Naïve Bayesian Classifier
P(A|B) = P(B|A) *P(A) / P(B) (Bayes theorem )
P(Spam|free)=P(free|spam)* P(Spam) / P(free)
Since P(Spam|free) > P(ham|free) , hence with this word, the message is classified as spam
Step 4 : Applying the classifier
If output eqn 1 is greater then eqn 2 , its classified as spam o/w ham
1
2
How it works
sms_classifier <- naiveBayes(sms_train, sms_raw_train$type)library(e1071)