SVM in R using e1071 - ida.liu.sejodfo01/files/r-svm-pres.pdf · SVM in R using e1071...

Department of Computer and Information Science (IDA)Linköping university, SWEDEN

LiU expa

ndin

g re

ality

SVM in R using e1071classification of extracted phrases

LiU expa

ndin

g re

ality

/ 16

Overview

• Links: bit.ly/fD9red

• Software

• JGR & Deducer• e1071 package

• Dataset

• Data massage in R

• SVM model training and test data prediction

2

LiU expa

ndin

g re

ality

/ 15

Software

3

LiU expa

ndin

g re

ality

/ 16

JGR & Deducer

• JGR (Java Gui for R - pronounced “Jaguar”)

• GUI with more functions then both the default GUI and RStudio (e.g. easy package management, Excel like data views)

• Deducer

• “Deducer is designed to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab”

• Plot builder, T-tests, m.m.

4

LiU expa

ndin

g re

ality

/ 16

JGR/Deducer installation

• Download JGR

• install.packages(c("JGR","Deducer"))

• library(Deducer)

5

LiU expa

ndin

g re

ality

/ 16

e1071

• Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, ...

• interface to libSVM

• install.packages(“e1071”)

• library(e1071)

6

LiU expa

ndin

g re

ality

/ 15

Dataset

7

LiU expa

ndin

g re

ality

/ 16

Initial data: 1800 observations

8

Basic BNC LM C04B LMclass

phrase

numberOfWords

bncZeroProbs

bncLogProb

bncPpl

bncPpl1

c04bZeroProbs

c04bLogProb

c04bPpl

c04bPpl1

LiU expa

ndin

g re

ality

/ 16

Tasks/Problems

• Read data

• Normalize values

• New column/feature

• Random ordering

• Column selection

• Train/Test partitioning

9

LiU expa

ndin

g re

ality

/ 16

Read data

xphrases = read.csv(file.choose(new=FALSE), header=TRUE)

10

LiU expa

ndin

g re

ality

/ 16

New feature: logProb delta

xphrases$logProbDelta = abs(xphrases$bncLogProb - xphrases$c04bLogProb)

11

LiU expa

ndin

g re

ality

/ 16

Normalization

xphrases.norm = xphrases

for(index in c(3:12)) {

xphrases.norm[,index] = (xphrases[,index] - min(xphrases[,index], na.rm=TRUE)) / max(xphrases[,index], na.rm=TRUE)

}

12

New frame

for each column we want to normalize

Do not use empty cells for min() and max()

LiU expa

ndin

g re

ality

/ 16

Random order

nrow(<data.frame>): number of rows

sample(<collection>, <size>): random sample

xphrases.rnd.order = xphrases.norm[sample(nrow(xphrases.norm)),]

13

List the rows in the order specified by the random sample and save to new data.frame

LiU expa

ndin

g re

ality

/ 16

Column selection & Test/Train

# test & train

xphrases.train = xphrases.rnd.order[1:1200,]

xphrases.test = xphrases.rnd.order[1201:1800,]

# remove unwanted columns

xphrases.test.clean = xphrases.test[c(1,3,5,6,9,10,12)]

xphrases.train.clean = xphrases.train[c(1,3,5,6,9,10,12)]

14

LiU expa

ndin

g re

ality

/ 16

SVM model + prediction# svm train

model = svm(class ~ ., data = xphrases.train.clean)

# svm test

pred = predict(model, xphrases.test.clean)

table(pred, xphrases.test$class)

xphrases.test$classpred no yes

no 68 44

yes 164 324

15

LiU expa

ndin

g re

ality

/ 16

View results

cbind(as.character(pred), as.character(xphrases.test$class), as.character(xphrases.test$phrase))

16

SVM in R using e1071 - ida.liu.sejodfo01/files/r-svm-pres.pdf · SVM in R using e1071...

Documents

Transcript of SVM in R using e1071 - ida.liu.sejodfo01/files/r-svm-pres.pdf · SVM in R using e1071...