Introduction to R for Big Data Analysis
-
Upload
raastech -
Category
Technology
-
view
325 -
download
1
Transcript of Introduction to R for Big Data Analysis
![Page 1: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/1.jpg)
Introduction to R For Big Data Analysis
Wednesday, October 13, 2015 6:00pm – 6:45 pm
Raastech, Inc. 2201 Cooperative Way, Suite 600 Herndon, VA 20171 +1-703-884-2223 [email protected]
![Page 2: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/2.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 2 of 51 @Raastech
About Me
Harold Dost III @hdost
7+ years of Oracle Middleware experience
OCE (SOA Foundation Practitioner)
Oracle ACE Associate
From Michigan
blog.raastech.com
![Page 3: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/3.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 3 of 51 @Raastech
About Raastech
Small systems integrator founded in 2009
Headquartered in the Washington DC area
Specializes in Oracle Fusion Middleware
Oracle Platinum Partner – 1 in 3,000 worldwide
Oracle SOA Specialized – 1 in 1,500 worldwide
Oracle ACE – 2 of 500 worldwide
100% of consultants are Oracle certified
100% of consultants present at major Oracle conferences
100% of consultants have published books, whitepapers, or articles
![Page 4: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/4.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 4 of 51 @Raastech
Outline
1. Getting Started
Installing R
Installing Tools
Getting Data
2. Understanding R
Data Types
Functions
Data Import Mechanisms
![Page 5: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/5.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 5 of 51 @Raastech
Outline (Cont.)
3. Manipulating Data (Large Data Sets)
Deriving Simple Statistics
Graphing
4. Demo
5. Incorporating into an Enterprise
Using Enterprise Data Sources
Running R in your environment.
Familiarize with Oracle's R offerings
![Page 6: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/6.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 6 of 51 @Raastech
![Page 7: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/7.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 7 of 51 @Raastech
Know CRAN
Comprehensive
R
Archive
Network
![Page 8: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/8.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 8 of 51 @Raastech
Installing R
Windows
Mac
Linux
![Page 9: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/9.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 9 of 51 @Raastech
Installing R
Windows https://cran.r-project.org/bin/windows/
Mac https://cran.r-project.org/bin/macosx/
Linux https://cran.r-project.org/bin/linux/
![Page 10: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/10.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 10 of 51 @Raastech
Development Tools
Rstudio - http://www.rstudio.com/products/rstudio/
Open Source Edition
Commercial License - $995
Eclipse
Sublime, TextPad, Other Simple Text Editors,…
![Page 11: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/11.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 11 of 51 @Raastech
Installing Packages
Anything From CRAN
Anywhere
install.packages(c(“first”, “second”))
> sudo R CMD INSTALL package-version.tar.gz
![Page 12: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/12.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 12 of 51 @Raastech
![Page 13: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/13.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 13 of 51 @Raastech
Data Types
Vectors
Matrices
Arrays
Data Frames
Lists
Factors
![Page 14: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/14.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 14 of 51 @Raastech
Special Values
Infinity, Positive and Negative: Inf and –Inf
Not A Number: NaN
Not Available: NA
Complex Numbers, 1+9i
![Page 15: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/15.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 15 of 51 @Raastech
Use Case for Infinities
Finding Maximums and Minimums
Placeholder values when others won’t work
![Page 16: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/16.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 16 of 51 @Raastech
Not a Number (NaN)
In means something went wrong somewhere
A missing argument
Invalid number
Check for with is.nan(x) to prevent leaking
Don’t use “==“ to find NaN, it will only give more NaN
![Page 17: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/17.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 17 of 51 @Raastech
Assigning NaN
> a = NaN
> a
[1] NaN
![Page 18: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/18.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 18 of 51 @Raastech
Adding NaN
Adding NaN
> b = 1
> c = a + b
> c
[1] NaN
When adding a number to NaN “Not a Number” you will get NaN.
![Page 19: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/19.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 19 of 51 @Raastech
Comparing NaN to Regular Number
> d = b == c
> d
[1] NA
When comparing a number to NaN “Not a Number” you will get NA.
![Page 20: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/20.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 20 of 51 @Raastech
Comparing NaN to NaN
> e = c == a
> e
[1] NA
When comparing NaN “Not a Number” to NaN you will get NA.
![Page 21: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/21.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 21 of 51 @Raastech
Detecting NaN
> a
[1] NaN
> is.nan(a)
[1] TRUE
> is.na(a)
[1] TRUE
Since NaN aren’t proper numbers, special functions must be used to detect them. They are the result of math gone wrong.
![Page 22: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/22.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 22 of 51 @Raastech
Detecting NA
> e = c == a
> e
[1] NA
> is.nan(e)
[1] FALSE
> is.na(e)
[1] TRUE
Just as with NaN special functions must be used, but NA generally indicates that there is missing information
![Page 23: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/23.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 23 of 51 @Raastech
Operators
Assignment ( ->, <-)
Addition (+)
Subtraction (–)
Division (/)
Multiplication (*)
Exponent (^)
Parentheses ( (, ) )
![Page 24: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/24.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 24 of 51 @Raastech
![Page 25: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/25.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 25 of 51 @Raastech
Math Functions
max()
min()
log()
sqrt()
![Page 26: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/26.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 26 of 51 @Raastech
Deriving Simple Statistics
Minimum
Maximum
Median
Arithmetic Mean
Function estimation
Linear
Log
Exponential
R-Values
Standard Deviation
![Page 27: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/27.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 27 of 51 @Raastech
How to define your own functions
firstfunction <- function(arg1, arg2, ... ){
statements
return(someoutput)
}
![Page 28: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/28.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 28 of 51 @Raastech
![Page 29: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/29.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 29 of 51 @Raastech
Twitter Example
First Install the Package
install.packages("twitteR”)
![Page 30: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/30.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 30 of 51 @Raastech
Twitter Example
Authenticate
consumer = "CONSUMER KEY"
secret = "SECRET KEY"
setup_twitter_oauth(consumer,secret)
![Page 31: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/31.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 31 of 51 @Raastech
Twitter Example
Get Trend Locations
The resulting WOEID (Where on Earth ID) can be
chosen
availableTrendLocations()
![Page 32: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/32.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 32 of 51 @Raastech
Twitter Example
Get Trends
trends = getTrends(SOMEWOEID)
![Page 33: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/33.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 33 of 51 @Raastech
Twitter Example
Retrieve Tweets
tweets <- searchTwitter(trends[XX,XX],n=1500)
tweetdf <- do.call("rbind",lapply(tweets,as.data.frame))
![Page 34: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/34.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 34 of 51 @Raastech
Twitter Example
Filter
complete.cases is used to check for NA and NaN
numbers
tweetdf <- tweetdf[complete.cases(tweetdf[,15]),]
tweetdf <- tweetdf[tweetdf[,15] != 0,]
![Page 35: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/35.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 35 of 51 @Raastech
Twitter Example
Simplify the dataframe
simpledf <- tweetdf[c("screenName","longitude","latitude")]
![Page 36: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/36.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 36 of 51 @Raastech
Twitter Example
Create Matrix from Dataframe
tweetMatrix <- data.matrix(simpledf[2:3],rownames.force = FALSE)
![Page 37: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/37.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 37 of 51 @Raastech
Twitter Example
Plot the Latitude and Longitude
plot(tweetMatrix)
![Page 38: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/38.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 38 of 51 @Raastech
Graphing
Image
Contour
Box Chart
![Page 39: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/39.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 39 of 51 @Raastech
K-Means
Essentially a search algorithm
Divides a dataset into k-clusters
![Page 40: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/40.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 40 of 51 @Raastech
Time Series
Stock Quotes
Infection Incidents
Gas Prices
Audio
Etc.
Source: http://www.loc.gov/pictures/resource/hec.23488/
![Page 41: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/41.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 41 of 51 @Raastech
Time Series Analysis
Regression
Forecasting
Time Frequency (FFTs)
Source: http://groups.csail.mit.edu/netmit/sFFT/algorithm.html
![Page 42: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/42.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 42 of 51 @Raastech
![Page 43: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/43.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 43 of 51 @Raastech
Using Enterprise Data Sources
Database
Streams
Files
Etc.
![Page 44: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/44.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 44 of 51 @Raastech
![Page 45: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/45.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 45 of 51 @Raastech
Oracle R Distribution
Available on Oracle Public Yum
Enhanced dynamic Library loading
Enterprise Support Available
Oracle Advanced Analytics
Oracle Linux
Oracle Big Data Appliance
http://www.oracle.com/technetwork/database/database-technologies/r/r-
distribution/overview/index.html
![Page 46: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/46.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 46 of 51 @Raastech
Oracle R Enterprise
Component of the Oracle Advanced
Analytics Option on Oracle Database EE
Allows use of R in the database without SQL
Save R Objects in the database
Easily Integrate with OBIEE
http://www.oracle.com/technetwork/database/database-
technologies/r/r-enterprise/overview/index.html
![Page 47: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/47.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 47 of 51 @Raastech
Oracle R Advanced Analytics for Hadoop
Component of the Oracle Big
Data Software Connectors Suite,
an option for the BDA
Provides abstraction from HiveQL
through R just as in Oracle R
Enterprise does for SQL
http://www.oracle.com/technetwork/database/
database-technologies/bdc/r-advanalytics-for-
hadoop/overview/index.html
![Page 48: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/48.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 48 of 51 @Raastech
ROracle
Open Source Package
Maintained by Oracle
Uses OCI Interface to interact with databases
http://www.oracle.com/technetwork/database/database-technologies/r/r-
technologies/overview/index.html
![Page 49: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/49.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 49 of 51 @Raastech
![Page 50: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/50.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 50 of 51 @Raastech
Contact Information
Harold Dost III
Principal Consultant
@hdost
![Page 51: Introduction to R for Big Data Analysis](https://reader034.fdocuments.us/reader034/viewer/2022052418/587063fe1a28ab48378b489d/html5/thumbnails/51.jpg)
© Raastech, Inc. 2015 | All rights reserved. Slide 51 of 51 @Raastech
Resources
https://en.wikibooks.org/wiki/Statistical_Analysis:_an_Introduction_using_R/R_basics
http://www.r-project.org/
https://docs.oracle.com/cd/E57012_01/doc.141/e56973/toc.htm
http://cran.r-project.org/web/packages/akmeans/index.html
http://cran.r-project.org/web/packages/twitteR/index.html
http://en.wikipedia.org/wiki/K-means_clustering
http://www.rdatamining.com/examples/kmeans-clustering
http://blog.revolutionanalytics.com/2009/02/how-to-choose-a-random-number-in-r.html
https://www.packtpub.com/books/content/text-mining-r-part-2
http://www.eia.gov/totalenergy/data/monthly/index.cfm#consumption