Post on 01-Apr-2015
R for Classification
Jennifer BroughtonShimadzu Research LaboratoryManchester, UK
jennifer.broughton@srlab.co.uk
2nd May 2013
Classification?
Automatic Identification of Type (Class) of Object from Measured Variables (Features)
Object Type Feature1 Feature2 Feature3 …….Feature nLabel 1 val[1,1] val[1,2] val[1,3] ……. val[1,n]Label 2 val[2,1] val[2,2] val[2,3] …….val[2,n]…… ……. ……. ……. …….………Label m val[m,1] val[m.2]val[m,3] ……. val[m,n]
2 of 17
Example Data
3 of 17
Data Preparation & Investigation
EDA Technique
Box Plots PCA Decision Trees Clustering
Training Set
• Best features to distinguish between classes
• Relationships between features• Feature reduction
4 of 17
Box Plots
PCA & Multivariate Analysis: ade4FactoMineR
5 of 17
Example Classifier
6 of 17
Classification Algorithms in R
Rattle: R Analytical Tool to Learn Easily (Rattle: A Data Mining GUI for R, Graham J Williams, The R Journal, 1(2):45-55 )7 of 17
SVM
8 of 17
Ensemble Algorithm
9 of 17
Training and Testing
Classification Algorithm:
Neural NetworkSupport Vector MachineRandom Forest
Training Set(labelled)
Test Set(unlabelled)
TrainedClassifier
Classification Results
PredictionResults
+ Labels
Assess Predictions:Confusion MatrixROC Curve (2 categories) ….
10 of 17
Using Classifiers in R
Select Training Data
Build Classifier
Run Classifier
classifier algorithm(formula, data, options)
(boosting and nnet)
classifier.pred predict(classifier, newdata, options)
11 of 17
SVM & Neural Net Tuning
12 of 17
Classifier Feedbackprint(classifier)plot(classifier)
high Gini Coefficient = high dispersion
13 of 17
Classifier Prediction Resultspredict(type = “class”)
predict(type = “prob”)
confusion matrix14 of 17
FalseNegative
TruePositive
TrueNegative
FalsePositive
Binary Classification Results
Y NClass Present?
ClassDetected?
Y
N
𝑻𝑷𝑹=𝑻𝑷
𝑻𝑷+𝑭𝑵=𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚
𝑭𝑷𝑹=𝑭𝑷
𝑻𝑵+𝑭𝑷=𝟏−𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚
15 of 17
ROC Curves in RROCR package
16 of 17
Example Results
17 of 17