SVM in R using e1071 - ida.liu.sejodfo01/files/r-svm-pres.pdf · SVM in R using e1071...
Transcript of SVM in R using e1071 - ida.liu.sejodfo01/files/r-svm-pres.pdf · SVM in R using e1071...
Department of Computer and Information Science (IDA)Linköping university, SWEDEN
LiU expa
ndin
g re
ality
SVM in R using e1071classification of extracted phrases
LiU expa
ndin
g re
ality
/ 16
Overview
• Links: bit.ly/fD9red
• Software
• JGR & Deducer• e1071 package
• Dataset
• Data massage in R
• SVM model training and test data prediction
2
LiU expa
ndin
g re
ality
/ 15
Software
3
LiU expa
ndin
g re
ality
/ 16
JGR & Deducer
• JGR (Java Gui for R - pronounced “Jaguar”)
• GUI with more functions then both the default GUI and RStudio (e.g. easy package management, Excel like data views)
• Deducer
• “Deducer is designed to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab”
• Plot builder, T-tests, m.m.
4
LiU expa
ndin
g re
ality
/ 16
JGR/Deducer installation
• Download JGR
• install.packages(c("JGR","Deducer"))
• library(Deducer)
5
LiU expa
ndin
g re
ality
/ 16
e1071
• Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, ...
• interface to libSVM
• install.packages(“e1071”)
• library(e1071)
6
LiU expa
ndin
g re
ality
/ 15
Dataset
7
LiU expa
ndin
g re
ality
/ 16
Initial data: 1800 observations
8
Basic BNC LM C04B LMclass
phrase
numberOfWords
bncZeroProbs
bncLogProb
bncPpl
bncPpl1
c04bZeroProbs
c04bLogProb
c04bPpl
c04bPpl1
LiU expa
ndin
g re
ality
/ 16
Tasks/Problems
• Read data
• Normalize values
• New column/feature
• Random ordering
• Column selection
• Train/Test partitioning
9
LiU expa
ndin
g re
ality
/ 16
Read data
xphrases = read.csv(file.choose(new=FALSE), header=TRUE)
10
LiU expa
ndin
g re
ality
/ 16
New feature: logProb delta
xphrases$logProbDelta = abs(xphrases$bncLogProb - xphrases$c04bLogProb)
11
LiU expa
ndin
g re
ality
/ 16
Normalization
xphrases.norm = xphrases
for(index in c(3:12)) {
xphrases.norm[,index] = (xphrases[,index] - min(xphrases[,index], na.rm=TRUE)) / max(xphrases[,index], na.rm=TRUE)
}
12
New frame
for each column we want to normalize
Do not use empty cells for min() and max()
LiU expa
ndin
g re
ality
/ 16
Random order
nrow(<data.frame>): number of rows
sample(<collection>, <size>): random sample
xphrases.rnd.order = xphrases.norm[sample(nrow(xphrases.norm)),]
13
List the rows in the order specified by the random sample and save to new data.frame
LiU expa
ndin
g re
ality
/ 16
Column selection & Test/Train
# test & train
xphrases.train = xphrases.rnd.order[1:1200,]
xphrases.test = xphrases.rnd.order[1201:1800,]
# remove unwanted columns
xphrases.test.clean = xphrases.test[c(1,3,5,6,9,10,12)]
xphrases.train.clean = xphrases.train[c(1,3,5,6,9,10,12)]
14
LiU expa
ndin
g re
ality
/ 16
SVM model + prediction# svm train
model = svm(class ~ ., data = xphrases.train.clean)
# svm test
pred = predict(model, xphrases.test.clean)
table(pred, xphrases.test$class)
xphrases.test$classpred no yes
no 68 44
yes 164 324
15
LiU expa
ndin
g re
ality
/ 16
View results
cbind(as.character(pred), as.character(xphrases.test$class), as.character(xphrases.test$phrase))
16