1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN,...
-
Upload
melvyn-nathaniel-thomas -
Category
Documents
-
view
215 -
download
0
Transcript of 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN,...
![Page 1: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/1.jpg)
1
Peter Fox
Data Analytics – ITWS-4963/ITWS-6965
Week 6b, February 28, 2014
Weighted kNN, clustering, more plottong, Bayes
![Page 2: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/2.jpg)
Plot tools/ tipshttp://statmethods.net/advgraphs/layout.html
http://flowingdata.com/2014/02/27/how-to-read-histograms-and-use-them-in-r/
pairs, gpairs, scatterplot.matrix, clustergram, etc.
data()
# precip, presidents, iris, swiss, sunspot.month (!), environmental, ethanol, ionosphere
More script fragments in Lab6b_*_2014.R on the web site (escience.rpi.edu/data/DA )
2
![Page 3: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/3.jpg)
Weighted KNN?require(kknn)
data(iris)
m <- dim(iris)[1]
val <- sample(1:m, size = round(m/3), replace = FALSE,
prob = rep(1/m, m))
iris.learn <- iris[-val,]
iris.valid <- iris[val,]
iris.kknn <- kknn(Species~., iris.learn, iris.valid, distance = 1,
kernel = "triangular")
summary(iris.kknn)
fit <- fitted(iris.kknn)
table(iris.valid$Species, fit)
pcol <- as.character(as.numeric(iris.valid$Species))
pairs(iris.valid[1:4], pch = pcol, col = c("green3", "red”)[(iris.valid$Species != fit)+1])
3
![Page 4: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/4.jpg)
4
Try Lab6b_8_2014.R
![Page 5: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/5.jpg)
New dataset - ionosphererequire(kknn)
data(ionosphere)
ionosphere.learn <- ionosphere[1:200,]
ionosphere.valid <- ionosphere[-c(1:200),]
fit.kknn <- kknn(class ~ ., ionosphere.learn, ionosphere.valid)
table(ionosphere.valid$class, fit.kknn$fit)
# vary kernel
(fit.train1 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15,
kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 1))
table(predict(fit.train1, ionosphere.valid), ionosphere.valid$class)
#alter distance
(fit.train2 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15,
kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 2))
table(predict(fit.train2, ionosphere.valid), ionosphere.valid$class)5
![Page 6: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/6.jpg)
Cluster plottingsource("http://www.r-statistics.com/wp-content/uploads/2012/01/source_https.r.txt") # source code from github
require(RCurl)
require(colorspace)
source_https("https://raw.github.com/talgalili/R-code-snippets/master/clustergram.r")
data(iris)
set.seed(250)
par(cex.lab = 1.5, cex.main = 1.2)
Data <- scale(iris[,-5]) # scaling
clustergram(Data, k.range = 2:8, line.width = 0.004) # line.width - adjust according to Y-scale 6
![Page 7: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/7.jpg)
Clustergram
7
![Page 8: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/8.jpg)
Any good?set.seed(500)
Data2 <- scale(iris[,-5])
par(cex.lab = 1.2, cex.main = .7)
par(mfrow = c(3,2))
for(i in 1:6) clustergram(Data2, k.range = 2:8 , line.width = .004, add.center.points = T)
8
![Page 9: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/9.jpg)
9
![Page 10: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/10.jpg)
How can you tell it is good?set.seed(250)
Data <- rbind( cbind(rnorm(100,0, sd = 0.3),rnorm(100,0, sd = 0.3),rnorm(100,0, sd = 0.3)),
cbind(rnorm(100,1, sd = 0.3),rnorm(100,1, sd = 0.3),rnorm(100,1, sd = 0.3)),
cbind(rnorm(100,2, sd = 0.3),rnorm(100,2, sd = 0.3),rnorm(100,2, sd = 0.3)))
clustergram(Data, k.range = 2:5 , line.width = .004, add.center.points = T)
10
![Page 11: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/11.jpg)
More complex…set.seed(250)
Data <- rbind( cbind(rnorm(100,1, sd = 0.3),rnorm(100,0, sd = 0.3),rnorm(100,0, sd = 0.3),rnorm(100,0, sd = 0.3)),
cbind(rnorm(100,0, sd = 0.3),rnorm(100,1, sd = 0.3),rnorm(100,0, sd = 0.3),rnorm(100,0, sd = 0.3)),
cbind(rnorm(100,0, sd = 0.3),rnorm(100,1, sd = 0.3),rnorm(100,1, sd = 0.3),rnorm(100,0, sd = 0.3)),
cbind(rnorm(100,0, sd = 0.3),rnorm(100,0, sd = 0.3),rnorm(100,0, sd = 0.3),rnorm(100,1, sd = 0.3)))
clustergram(Data, k.range = 2:8 , line.width = .004, add.center.points = T)
11
![Page 12: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/12.jpg)
12
• Look at the location of the cluster points on the Y axis. See when they remain stable, when they start flying around, and what happens to them in higher number of clusters (do they re-group together)
• Observe the strands of the datapoints. Even if the clusters centers are not ordered, the lines for each item might (needs more research and thinking) tend to move together – hinting at the real number of clusters
• Run the plot multiple times to observe the stability of the cluster formation (and location)
http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/
![Page 13: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/13.jpg)
13
![Page 14: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/14.jpg)
Swiss - pairs
14
pairs(~ Fertility + Education + Catholic, data = swiss, subset = Education < 20, main = "Swiss data, Education < 20")
![Page 15: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/15.jpg)
ctree
15
require(party)
swiss_ctree <- ctree(Fertility ~ Agriculture + Education + Catholic, data = swiss)
plot(swiss_ctree)
![Page 16: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/16.jpg)
Hierarchical clustering
16
> dswiss <- dist(as.matrix(swiss))
> hs <- hclust(dswiss)
> plot(hs)
![Page 17: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/17.jpg)
scatterplotMatrix
17
![Page 18: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/18.jpg)
require(lattice); splom(swiss)
18
![Page 19: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/19.jpg)
Decision tree (reminder)> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> str(swiss)
…
19
![Page 20: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/20.jpg)
Beyond plot: pairspairs(iris[1:4], main = "Anderson's Iris Data -- 3 species”, pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])
20
Try Lab6b_2_2014.R - USJudgeRatings
![Page 21: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/21.jpg)
Try hclust for iris
21
![Page 22: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/22.jpg)
gpairs(iris)
22
Try Lab6b_3_2014.R
![Page 23: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/23.jpg)
Better scatterplots
23
install.packages("car")
require(car)
scatterplotMatrix(iris)
Try Lab6b_4_2014.R
![Page 24: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/24.jpg)
splom(iris) # default
24
Try Lab6b_7_2014.R
![Page 25: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/25.jpg)
splom extra!require(lattice)
super.sym <- trellis.par.get("superpose.symbol")
splom(~iris[1:4], groups = Species, data = iris,
panel = panel.superpose,
key = list(title = "Three Varieties of Iris",
columns = 3,
points = list(pch = super.sym$pch[1:3],
col = super.sym$col[1:3]),
text = list(c("Setosa", "Versicolor", "Virginica"))))
splom(~iris[1:3]|Species, data = iris,
layout=c(2,2), pscales = 0,
varnames = c("Sepal\nLength", "Sepal\nWidth", "Petal\nLength"),
page = function(...) {
ltext(x = seq(.6, .8, length.out = 4),
y = seq(.9, .6, length.out = 4),
labels = c("Three", "Varieties", "of", "Iris"),
cex = 2)
})
parallelplot(~iris[1:4] | Species, iris)
parallelplot(~iris[1:4], iris, groups = Species,
horizontal.axis = FALSE, scales = list(x = list(rot = 90)))
> Lab6b_7_2014.R
25
![Page 26: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/26.jpg)
26
![Page 27: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/27.jpg)
27
![Page 28: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/28.jpg)
28
![Page 29: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/29.jpg)
29
![Page 30: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/30.jpg)
Ctree> iris_ctree <- ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=iris)
> print(iris_ctree)
Conditional inference tree with 4 terminal nodes
Response: Species
Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
Number of observations: 150
1) Petal.Length <= 1.9; criterion = 1, statistic = 140.264
2)* weights = 50
1) Petal.Length > 1.9
3) Petal.Width <= 1.7; criterion = 1, statistic = 67.894
4) Petal.Length <= 4.8; criterion = 0.999, statistic = 13.865
5)* weights = 46
4) Petal.Length > 4.8
6)* weights = 8
3) Petal.Width > 1.7
7)* weights = 46 30
![Page 31: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/31.jpg)
plot(iris_ctree)
31
Try Lab6b_5_2014.R> plot(iris_ctree, type="simple”) # try this
![Page 32: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/32.jpg)
Try these on mapmeans, etc.
32
![Page 33: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/33.jpg)
Something simpler – kmeans and…
> mapmeans<-data.frame(as.numeric(mapcoord$NEIGHBORHOOD), adduse$GROSS.SQUARE.FEET, adduse$SALE.PRICE, adduse$'querylist$latitude', adduse$'querylist$longitude')
> mapobjnew<-kmeans(mapmeans,5, iter.max=10, nstart=5, algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen"))
> fitted(mapobjnew,method=c("centers","classes"))
• Others? 33
![Page 34: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/34.jpg)
Plotting clusters (DIY)library(cluster)
clusplot(mapmeans, mapobj$cluster, color=TRUE, shade=TRUE, labels=2, lines=0)
# Centroid Plot against 1st 2 discriminant functions
#library(fpc)
plotcluster(mapmeans, mapobj$cluster)• dendogram?
library(fpc)• cluster.stats
34
![Page 35: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/35.jpg)
Bayes> cl <- kmeans(iris[,1:4], 3)
> table(cl$cluster, iris[,5])
setosa versicolor virginica
2 0 2 36
1 0 48 14
3 50 0 0
#
> m <- naiveBayes(iris[,1:4], iris[,5])
> table(predict(m, iris[,1:4]), iris[,5])
setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 3
virginica 0 3 47 35
pairs(iris[1:4],main="Iris Data (red=setosa,green=versicolor,blue=virginica)", pch=21, bg=c("red","green3","blue")[unclass(iris$Species)])
![Page 36: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/36.jpg)
Digging into irisclassifier<-naiveBayes(iris[,1:4], iris[,5])
table(predict(classifier, iris[,-5]), iris[,5], dnn=list('predicted','actual'))
classifier$apriori
classifier$tables$Petal.Length
plot(function(x) dnorm(x, 1.462, 0.1736640), 0, 8, col="red", main="Petal length distribution for the 3 different species")
curve(dnorm(x, 4.260, 0.4699110), add=TRUE, col="blue")
curve(dnorm(x, 5.552, 0.5518947 ), add=TRUE, col = "green") 36
![Page 37: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/37.jpg)
37
![Page 38: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/38.jpg)
Using a contingency table> data(Titanic)
> mdl <- naiveBayes(Survived ~ ., data = Titanic)
> mdl
38
Naive Bayes Classifier for Discrete PredictorsCall: naiveBayes.formula(formula = Survived ~ ., data = Titanic)A-priori probabilities:Survived No Yes 0.676965 0.323035 Conditional probabilities: ClassSurvived 1st 2nd 3rd Crew No 0.08187919 0.11208054 0.35436242 0.45167785 Yes 0.28551336 0.16596343 0.25035162 0.29817159 SexSurvived Male Female No 0.91543624 0.08456376 Yes 0.51617440 0.48382560 AgeSurvived Child Adult No 0.03489933 0.96510067 Yes 0.08016878 0.91983122 Try Lab6b_9_2014.R
![Page 39: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/39.jpg)
http://www.ugrad.stat.ubc.ca/R/library/mlbench/html/HouseVotes84.html
require(mlbench)
data(HouseVotes84)
model <- naiveBayes(Class ~ ., data = HouseVotes84)
predict(model, HouseVotes84[1:10,-1])
predict(model, HouseVotes84[1:10,-1], type = "raw")
pred <- predict(model, HouseVotes84[,-1])
table(pred, HouseVotes84$Class) 39
![Page 40: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/40.jpg)
Exercise for you> data(HairEyeColor)
> mosaicplot(HairEyeColor)
> margin.table(HairEyeColor,3)
Sex
Male Female
279 313
> margin.table(HairEyeColor,c(1,3))
Sex
Hair Male Female
Black 56 52
Brown 143 143
Red 34 37
Blond 46 81
How would you construct a naïve Bayes classifier and test it? 40
![Page 41: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/41.jpg)
Assignment 5• Project proposals…
• Let’s look at it
• Assignment 4 - how is it going – assume you all start after today?
41
![Page 42: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/42.jpg)
Assignment 6 preview• Your term projects should fall within the scope of a data analytics
problem of the type you have worked with in class/ labs, or know of yourself – the bigger the data the better. This means that the work must go beyond just making lots of figures. You should develop the project to indicate you are thinking of and exploring the relationships and distributions within your data. Start with a hypothesis, think of a way to model and use the hypothesis, find or collect the necessary data, and do both preliminary analysis, detailed modeling and summary (interpretation). – Note: You do not have to come up with a positive result, i.e. disproving the hypothesis
is just as good. Please use the section numbering below for your written submission for this assignment.
• Introduction (2%)• Data Description (3%)• Analysis (8%)• Model Development (8%)• Conclusions and Discussion (4%)• Oral presentation (5%) (10 mins)
42
![Page 43: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/43.jpg)
Assignments to come• Term project (6). Due ~ week 13/ 14 – early May. 30% (25%
written, 5% oral; individual). Available after spring break.
• Assignment 7: Predictive and Prescriptive Analytics. Due ~ week 10. 15% (15% written; individual);
43
![Page 44: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 6b, February 28, 2014 Weighted kNN, clustering, more plottong, Bayes.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1e5503460f94c354dd/html5/thumbnails/44.jpg)
Admin info (keep/ print this slide)• Class: ITWS-4963/ITWS 6965• Hours: 12:00pm-1:50pm Tuesday/ Friday• Location: SAGE 3101• Instructor: Peter Fox• Instructor contact: [email protected], 518.276.4862 (do not
leave a msg)• Contact hours: Monday** 3:00-4:00pm (or by email appt)• Contact location: Winslow 2120 (sometimes Lally 207A
announced by email)• TA: Lakshmi Chenicheri [email protected] • Web site: http://tw.rpi.edu/web/courses/DataAnalytics/2014
– Schedule, lectures, syllabus, reading, assignments, etc.
44