Missing Values

Missing values

Data

numeric

Mean Median

Categorical

Mode

Less outliersLarge dataset

Missing value through prediction Missing variable Associated

variable Type of technique Remarks Assumptions

Categorical Categorical • Decision tree • Naïve Bayesian

• Decesion tree need no assumption

• Naïve bayes assume independent variables

Categorical Numeric • Logistic regression

• K-NN classifier

• K-NN CLASSIFIER need no assumption

• Regression assumption of normality, homoscedasticity etc

Numeric Numeric • Regression model

• Clustering

• Clustering need no assumption

Regression assumption of normality, homoscedasticity etc

Numeric Categorical • Clustering • No assumption

Categorical Both • Decision tree • Multinomial

regression

• No assumption for decision tree

• Regression assumtpions

Numeric Both • Clustering • No assumption

K-NN Classifier

3-NN classifier

K-NN Classifier

K-NN Classifier

If k is too small, sensitive to noise points

If k is too large, neighborhood may include points from other classes

K-NN Classifier

Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes

Knn classifier is a lazy learner because It does not build models explicitly

Testing with different k

Step 4 : Applying the classifier

If output eqn 1 is greater then eqn 2 , its classified as spam o/w ham

1

2

How it works

sms_classifier <- naiveBayes(sms_train, sms_raw_train$type)library(e1071)

Missing Values

Documents

Transcript of Missing Values