Data analysis with R
-
Upload
sharethis -
Category
Data & Analytics
-
view
817 -
download
2
description
Transcript of Data analysis with R
![Page 1: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/1.jpg)
SHARETHISDATA ANALYSIS with RHassan Namarvar
![Page 2: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/2.jpg)
2
WHAT IS R?
• R is a free software programming language and software development for statistical computing and graphics.
• It is similar to S language developed at AT&T Bell Labs by Rick Becker, John Chambers and Allan Wilks.
• R was initially developed by Ross Ihaka and Robert Gentleman (1996), from the University of Auckland, New Zealand.
• R source code is written in C, Fortran, and R.
![Page 3: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/3.jpg)
3
R PARADIGMS
Multi paradigms:– Array– Object-oriented– Imperative– Functional– Procedural– Reflective
![Page 4: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/4.jpg)
4
STATISTICAL FEATURES
• Graphical Techniques• Linear and nonlinear modeling• Classical statistical tests• Time-series analysis• Classification• Clustering• Machine learning
![Page 5: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/5.jpg)
5
PROGRAMMING FEATURES
• R is an interpreted language• Access R through a command-line interpreter• Like MATLAB, R supports matrix arithmetic• Data structures:
– Vectors – Metrics – Array– Data Frames – Lists
![Page 6: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/6.jpg)
6
ADVANTAGES OF R
• The most comprehensive statistical analysis package available.
• Outstanding graphical capabilities• Open source software – reviewed by experts• R is free and licensed under the GNU.• R has over 5,578 packages as of May 31, 2014!• R is cross-platform. GNU/Linux, Mac, Windows.• R plays well with CSV, SAS, SPSS, Excel, Access, Oracle,
MySQL, and SQLite.
![Page 7: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/7.jpg)
7
HOW TO INSTALL R?
• Download an install the latest version from:– http://cran.r-project.org
• Install packages from R Console:– > install.packages(‘package_name’)
• R has its own LaTeX-like documentation:– > help()
![Page 8: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/8.jpg)
8
STARTING WITH R
• In R console:– > x <- 2– > x– > y <- x^2– > y– > ls()– > rm(y)
• Vectors:– > v <- c(4, 7, 23.5, 76.2, 80)– > Summary(v)
![Page 9: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/9.jpg)
9
STARTING WITH R
• Histogram:– > r <- rnorm(100)– > summary(r)– > plot(r)– > hist(r)
• QQ-Plot (Quantile):– > qqplot(r, rnorm(1000))
![Page 10: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/10.jpg)
10
STARTING WITH R
• Factors:– > g <- c(‘f’, ‘m’, ‘m’, ‘m’, ‘f’, ‘m’, ‘f’, ‘m’)
– > h <- factor(g)– > table(g)
• Matrices:– > r <- rnorm(100)– > dim(r) <- c(50,2)– > r– > Summary(r)– > M <- matrix(c(45, 23, 66, 77, 33, 44), 2, 3, byrow=T)
![Page 11: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/11.jpg)
11
STARTING WITH R
• Data Frames:
– > n = c(2, 3, 5) – > s = c("aa", "bb", "cc") – > b = c(TRUE, FALSE, TRUE) – > df = data.frame(n, s, b)
• Built-in Data Set:– > state.x77– > st = as.data.frame(state.x77)– > st$Density = st$Population * 1000 / st$Area– > summary(st)– > cor(st)– > pairs(st)
![Page 12: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/12.jpg)
12
STARTING WITH R
![Page 13: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/13.jpg)
13
LINEAR REGRESSION MODEL IN R
• Linear Regression Model:
– > x <- 1:100 – > y <- x^3
– Model y = a + b . x
– > lm(y ~ x) – > model <- lm(y ~ x)– > summary(model)– > par(mfrow=c(2,2)) – > plot(model)
![Page 14: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/14.jpg)
14
LM MODEL
– Call:– lm(formula = y ~ x)– Residuals:– Min 1Q Median 3Q Max – -129827 -103680 -29649 85058 292030 – Coefficients:– Estimate Std. Error t value Pr(>|t|) – (Intercept) -207070.2 23299.3 -8.887 3.14e-14 ***– x 9150.4 400.6 22.844 < 2e-16 ***– ---– Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
’ 1
– Residual standard error: 115600 on 98 degrees of freedom– Multiple R-squared: 0.8419, Adjusted R-squared: 0.8403 – F-statistic: 521.9 on 1 and 98 DF, p-value: < 2.2e-16
![Page 15: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/15.jpg)
15
LM MODEL
![Page 16: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/16.jpg)
16
DIAGNOSIS PLOT
![Page 17: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/17.jpg)
17
LINEAR REGRESSION MODEL IN R
• Model Built-in Data:
– > colnames(st)[4] = "Life.Exp"– > colnames(st)[6] = "HS.Grad"– model1 = lm(Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad + Frost + Area + Density, data=st)
– > summary(model1)– > model2 <- step(model1)– > model3 = update(model2, .~.-Population)
– > Summary(model3)
![Page 18: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/18.jpg)
18
LINEAR REGRESSION MODEL IN R
• Confidence limits on Estimated Coefficients:
– > confint(model3)– > predict(model3, list(Murder=10.5, HS.Grad=48, Frost=100))
![Page 19: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/19.jpg)
19
OUTLIERS
• Boxplot:
– > v <- rnorm(100) – > v = c(v,10) – > boxplot(v) – > rug(jitter(v), side=2)
![Page 20: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/20.jpg)
20
PROBABILITY DENSITY FUNCTION
• PDF:
– > r <- rnorm(1000)– > hist(r, prob=T)– > lines(density(r), col="red")
![Page 21: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/21.jpg)
21
CASE STUDY: SHARETHIS EXAMPLE
• Relationship of clicks with winning price and Impression on ADX:
• Data– Analyzed ADX Hourly Impression Logs
• Method– Detected outliers– Predicted clicks using a regression tree model
![Page 22: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/22.jpg)
22
CASE STUDY: SHARETHIS EXAMPLE
• Outlier Detection:
Clicks Impressions
![Page 23: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/23.jpg)
23
CASE STUDY: SHARETHIS EXAMPLE
• Regression Tree– One of the most powerful classification/regression
– > library(rpart)– > fit <- rpart(log(CLK) ~ log(IMP) + AVG_PRICE + SD_PRICE, data=x)
– > plot(fit)– > text(fit)– > plot(predict(fit), log(x$CLK))
![Page 24: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/24.jpg)
24
CASE STUDY: SHARETHIS EXAMPLE
• Regression Tree
![Page 25: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/25.jpg)
25
CASE STUDY: SHARETHIS EXAMPLE
• Predict Log of Clicks
![Page 26: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/26.jpg)
26
CASE STUDY: COLOR DETECTION
• Detect color from product image:
![Page 27: Data analysis with R](https://reader035.fdocuments.us/reader035/viewer/2022081502/554ef0afb4c905911d8b55c9/html5/thumbnails/27.jpg)
27
RESOURCES
• Books:
– An Introduction to Statistical Learning: with Applications in R by G. James, D. Witten, T. Hatie, R. Tibshirani, 2013
– The Art of R Programming: A Tour of Statistical Software Design, N. Matloff, 2011
– R Cookbook (O'Reilly Cookbooks), P. Teetor, 2011
• R Blog:– http://www.r-bloggers.com