R language tutorial
-
Upload
david-chiu -
Category
Technology
-
view
3.209 -
download
2
description
Transcript of R language tutorial
![Page 1: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/1.jpg)
1Confidential | Copyright 2013 Trend Micro Inc.
David Chiu
R Language Tutorial
04/11/2023
![Page 2: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/2.jpg)
Confidential | Copyright 2012 Trend Micro Inc.
Background of R
04/11/2023 2
![Page 3: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/3.jpg)
Confidential | Copyright 2012 Trend Micro Inc.
What is R?
• GNU Project Developed by John Chambers @ Bell Lab
• Free software environment for statistical computing and graphics
• Functional programming language written primarily in C, Fortran
04/11/2023 3
![Page 4: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/4.jpg)
R Language
• R is functional programming language
• R is an interpreted language
• R is object oriented-language
![Page 5: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/5.jpg)
Why Using R
• Statistic analysis on the fly
• Mathematical function and graphic module embedded
• FREE! & Open Source! – http://cran.r-project.org/src/base/
![Page 6: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/6.jpg)
Kaggle
http://www.kaggle.com/
R is the most widely language used by kaggle participants
![Page 7: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/7.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Data Scientist of these Companies Using R
What is your programming language of choice, R, Python or something else?
“I use R, and occasionally matlab, for data analysis. There is a large, active and extremely knowledgeable R community at Google.”http://simplystatistics.org/2013/02/15/interview-with-nick-chamandy-statistician-at-google/
04/11/2023 7
“Expert knowledge of SAS (With Enterprise Guide/Miner) required and candidates with strong knowledge of R will be preferred”http://www.kdnuggets.com/jobs/13/03-29-apple-sr-data-scientist.html?utm_source=twitterfeed&utm_medium=facebook&utm_campaign=tfb&utm_content=FaceBook&utm_term=analytics#.UVXibgXOpfc.facebook
![Page 8: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/8.jpg)
Commercial support for R
• In 2007, Revolution Analytics providea commercial support for Revolution R
– http://www.revolutionanalytics.com/products/revolution-r.php– http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php
• Big Data Appliance, which integrates R, Apache Hadoop, Oracle Enterprise Linux, and a NoSQL database with the Exadata hardware– http://
www.oracle.com/us/products/database/big-data-appliance/overview/index.html
![Page 9: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/9.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Revolotion R
• Free for Community Version– http://www.revolutionanalytics.com/downloads/
– http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
04/11/2023 9
Base R 2.14.2 64
Revolution R (1-core)
Revolution R (4-core) Speedup (4 core)
Matrix Calculation 17.4 sec 2.9 sec 2.0 sec 7.9x
Matrix Functions 10.3 sec 2.0 sec 1.2 sec 7.8x
Program Control 2.7 sec 2.7 sec 2.7 sec Not Appreciable
![Page 10: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/10.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
IDE
R Studio
• http://www.rstudio.com/
04/11/2023 10
RGUI
• http://www.r-project.org/
![Page 11: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/11.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Web App Development
Shiny makes it super simple for R users like you to turn analyses into interactive web applications that anyone can use
http://www.rstudio.com/shiny/
04/11/2023 11
![Page 12: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/12.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Package Management
• CRAN (Comprehensive R Archive Network)
04/11/2023 12
Repository URLCRAN http://cran.r-project.org/web/packages/Bioconductor http://www.bioconductor.org/packages/release/Software.htmlR-Forge http://r-forge.r-project.org/
![Page 13: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/13.jpg)
Confidential | Copyright 2012 Trend Micro Inc.
R Basic
04/11/2023 13
![Page 14: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/14.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Basic Command
• help()– help(demo)
• demo()– demo(is.things)
• q()
• ls()
• rm()– rm(x)
04/11/2023 14
![Page 15: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/15.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Basic Object
• Vector
• List
• Factor
• Array
• Matrix
• Data Frame
04/11/2023 15
![Page 16: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/16.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Objects & Arithmetic
• Scalar– x=3; y<-5; x+y
• Vectors– x = c(1,2,3, 7); y= c(2,3,5,1); x+y; x*y; x – y; x/y;– x =seq(1,10); y= 2:11; x+y– x =seq(1,10,by=2); y =seq(1,10,length=2)– rep(c(5,8), 3)– x= c(1,2,3); length(x)
04/11/2023 16
![Page 17: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/17.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Summaries and Subscripting
• Summary– X = c(1,2,3,4,5,6,7,8,9,10)– mean(x), min(x), median(x), max(x), var(x)– summary(x)
• Subscripting– x = c(1,2,3,4,5,6,7,8,9,10)– x[1:3]; x[c(1,3,5)];– x[c(1,3,5)] * 2 + x[c(2,2,2)]– x[-(1:6)]
04/11/2023 17
![Page 18: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/18.jpg)
Lists
• Contain a heterogeneous selection of objects– e <- list(thing="hat", size="8.25"); e– l <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)– l$j– man = list(name="Qoo", height=183); man$name
![Page 19: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/19.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Factor
• Ordered collection of items to present categorical value
• Different values that the factor can take are called levels
• Factors– phone = factor(c('iphone', 'htc', 'iphone', 'samsung', 'iphone',
'samsung'))– levels(phone)
04/11/2023 19
![Page 20: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/20.jpg)
Matrices & Array
• Array– An extension of a vector to more than two dimensions– a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4))
• Matrices– A vector to two dimensions – 2d-array– x = c(1,2,3); y = c(4,5,6); rbind(x,y);cbind(x,y)– x = rbind(c(1,2,3),c(4,5,6)); dim(x)– x<-matrix(c(1,2,3,4,5,6),nr=3); – x<-matrix(c(1,2,3,4,5,6),nrow=3, ,byrow=T)– x<-matrix(c(1,2,3,4),nr=2);y<-matrix(c(5,6),nr=2); x%*%y– t(matrix(c(1,2,3,4),nr=2))– solve(matrix(c(1,2,3,4),nr=2))
![Page 21: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/21.jpg)
Data Frame
• Useful way to represent tabular data
• essentially a matrix with named columns may also include non-numerical variables
• Example– df = data.frame(a=c(1,2,3,4,5),b=c(2,3,4,5,6));df
![Page 22: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/22.jpg)
Function
• Function– `%myop%` <- function(a, b) {2*a + 2*b}; 1 %myop% 1– f <- function(x) {return(x^2 + 3)}create.vector.of.ones <- function(n) {
return.vector <- NA; for (i in 1:n) { return.vector[i] <- 1; } return.vector;
} – create.vector.of.ones(3)
• Control Structures– If …else…– Repeat, for, while
• Catch error – trycatch
![Page 23: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/23.jpg)
Anonymous Function
• Functional language Characteristic– apply.to.three <- function(f) {f(3)}– apply.to.three(function(x) {x * 7})
![Page 24: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/24.jpg)
Objects and Classes
• All R code manipulates objects.
• Every object in R has a type
• In assignment statements, R will copy the object, not just the reference to the object Attributes
![Page 25: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/25.jpg)
S3 & S4 Object
• Many R functions were implemented using S3 methods
• In S version 4 (hence S4), formal classes and methods were introduced that allowed – Multiple arguments– Abstract types– inheritance.
![Page 26: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/26.jpg)
OOP of S4
• S4 OOP Example– setClass("Student", representation(name = "character",
score="numeric"))– studenta = new ("Student", name="david", score=80 )– studentb = new ("Student", name="andy", score=90 )setMethod("show", signature("Student"), function(object) { cat(object@score+100) })– setGeneric("getscore", function(object)
standardGeneric("getscore"))– Studenta
![Page 27: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/27.jpg)
Packages
• A package is a related set of functions, help files, and data files that have been bundled together.
• Basic Command– library(rpart)– CRAN– Install– (.packages())
![Page 28: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/28.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Package used in Machine Learning for Hackers
04/11/2023 28
![Page 29: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/29.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Apply
• Apply– Returns a vector or array or list of values obtained by applying a
function to margins of an array or matrix.
– data <- cbind(c(1,2),c(3,4)) – data.rowsum <- apply(data,1,sum) – data.colsum <- apply(data,2,sum) – data
04/11/2023 29
![Page 30: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/30.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Apply
• lapply – returns a list of the same length as X, each element of which is
the result of applying FUN to the corresponding element of X.
• sapply – is a user-friendly version and wrapper of lapply by default
returning a vector, matrix or
• vapply – is similar to sapply, but has a pre-specified type of return value,
so it can be safer (and sometimes faster) to use.
04/11/2023 30
![Page 31: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/31.jpg)
File IO
• Save and Load– x = USPersonalExpenditure – save(x, file="~/test.RData") – rm(x) – load("~/test.RData") – x
![Page 32: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/32.jpg)
Charts and Graphics
![Page 33: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/33.jpg)
Plotting Example
– xrange = range(as.numeric(colnames(USPersonalExpenditure)));– yrange= range(USPersonalExpenditure);– plot(xrange, yrange, type="n", xlab="Year",ylab="Category" )
– for(i in 1:5) {
lines(as.numeric(colnames(USPersonalExpenditure)),USPersonalExpenditure[i,], type="b", lwd=1.5)
}
![Page 34: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/34.jpg)
IRIS Dataset
• data()
![Page 35: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/35.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
IRIS Dataset
• The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Fisher (1936) as an example ofdiscriminant analysis.[1] It is sometimes called Anderson's Iris data set– http://en.wikipedia.org/wiki/Iris_flower_data_set
04/11/2023 35
Iris setosa Iris versicolor Iris virginica
![Page 36: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/36.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Classification of IRIS
• Classification Example– install.packages("e1071")– pairs(iris[1:4],main="Iris Data
(red=setosa,green=versicolor,blue=virginica)", pch=21, bg=c("red","green3","blue")[unclass(iris$Species)])
– classifier<-naiveBayes(iris[,1:4], iris[,5])– table(predict(classifier, iris[,-5]), iris[,5])– classifier<-svm(iris[,1:4], iris[,5]) > table(predict(classifier, iris[,-
5]), iris[,5] + )– prediction = predict(classifier, iris[,1:4])
• http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/Na%C3%AFve_Bayes
04/11/2023 36
![Page 37: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/37.jpg)
Performance Tips
• Use Built-in Math Functions
• Use Environments for Lookup Tables
• Use a Database to Query Large Data Sets
• Preallocate Memory
• Monitor How Much Memory You Are Using
• Cleaning Up Objects
• Functions for Big Data Sets
• Parallel Computation with R
![Page 38: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/38.jpg)
Confidential | Copyright 2012 Trend Micro Inc.
R for Machine Learning
04/11/2023 38
![Page 39: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/39.jpg)
Helps of the Topic
• ?read.delim – # Access a function's help file
• ??base::delim – # Search for 'delim' in all help files for functions in 'base'
• help.search("delimited") – # Search for 'delimited' in all help files
• RSiteSearch("parsing text") – # Search for the term 'parsing text' on the R site.
![Page 40: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/40.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Sample Code of Chapter 1
• https://github.com/johnmyleswhite/ML_for_Hackers.git
04/11/2023 40
![Page 41: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/41.jpg)
Confidential | Copyright 2012 Trend Micro Inc.
Reference & Resource
04/11/2023 41
![Page 42: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/42.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Study Material
• R in a nutshell
04/11/2023 42
![Page 43: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/43.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Online Reference
04/11/2023 43
![Page 44: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/44.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Community Resources for R help
04/11/2023 44
![Page 45: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/45.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Resource
• Websites– Stackoverflow – Cross Validated– R-help– R-devel– R-sig-*– Package-specific mailing list
• Blog– R-bloggers
• Twitter– https://twitter.com/#rstats
• Quora– http://www.quora.com/R-software
04/11/2023 45
![Page 46: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/46.jpg)
Confidential | Copyright 2013 Trend Micro Inc.
Resource (Con’d)
• Conference– useR!– R in Finance– R in Insurance– Others– Joint Statistical Meetings– Royal Statistical Society Conference
• Local User Group– http://blog.revolutionanalytics.com/local-r-groups.html
• Taiwan R User Group– http://www.facebook.com/Tw.R.User– http://www.meetup.com/Taiwan-R/
04/11/2023 46
![Page 47: R language tutorial](https://reader038.fdocuments.us/reader038/viewer/2022103016/554efa23b4c90547648b4b67/html5/thumbnails/47.jpg)
04/11/2023 47Confidential | Copyright 2012 Trend Micro Inc.
Thank You!