R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK...

Post on 01-Apr-2015

224 views 7 download

Tags:

Transcript of R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK...

R for Classification

Jennifer BroughtonShimadzu Research LaboratoryManchester, UK

jennifer.broughton@srlab.co.uk

2nd May 2013

Classification?

Automatic Identification of Type (Class) of Object from Measured Variables (Features)

Object Type Feature1 Feature2 Feature3 …….Feature nLabel 1 val[1,1] val[1,2] val[1,3] ……. val[1,n]Label 2 val[2,1] val[2,2] val[2,3] …….val[2,n]…… ……. ……. ……. …….………Label m val[m,1] val[m.2]val[m,3] ……. val[m,n]

2 of 17

Example Data

3 of 17

Data Preparation & Investigation

EDA Technique

Box Plots PCA Decision Trees Clustering

Training Set

• Best features to distinguish between classes

• Relationships between features• Feature reduction

4 of 17

Box Plots

PCA & Multivariate Analysis: ade4FactoMineR

5 of 17

Example Classifier

6 of 17

Classification Algorithms in R

Rattle: R Analytical Tool to Learn Easily (Rattle: A Data Mining GUI for R, Graham J Williams, The R Journal, 1(2):45-55 )7 of 17

SVM

8 of 17

Ensemble Algorithm

9 of 17

Training and Testing

Classification Algorithm:

Neural NetworkSupport Vector MachineRandom Forest

Training Set(labelled)

Test Set(unlabelled)

TrainedClassifier

Classification Results

PredictionResults

+ Labels

Assess Predictions:Confusion MatrixROC Curve (2 categories) ….

10 of 17

Using Classifiers in R

Select Training Data

Build Classifier

Run Classifier

classifier algorithm(formula, data, options)

(boosting and nnet)

classifier.pred predict(classifier, newdata, options)

11 of 17

SVM & Neural Net Tuning

12 of 17

Classifier Feedbackprint(classifier)plot(classifier)

high Gini Coefficient = high dispersion

13 of 17

Classifier Prediction Resultspredict(type = “class”)

predict(type = “prob”)

confusion matrix14 of 17

FalseNegative

TruePositive

TrueNegative

FalsePositive

Binary Classification Results

Y NClass Present?

ClassDetected?

Y

N

𝑻𝑷𝑹=𝑻𝑷

𝑻𝑷+𝑭𝑵=𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚

𝑭𝑷𝑹=𝑭𝑷

𝑻𝑵+𝑭𝑷=𝟏−𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚

15 of 17

ROC Curves in RROCR package

16 of 17

Example Results

17 of 17