@ctjava#r+java
Combining R with Java
Ryan CuprakElsa Cuprak@ctjava cuprak.info
@ctjava#r+java
Combining R with Java
@ctjava#r+java
Agenda
R Overvie
w
R + Java
R + Java EE
@ctjava#r+java
What is R?• Free open-source alternative to Matlab, SAS, Excel, and SPSS
• R is:
• Statistical software
• Language
• Environment
• Ecosystem
• Used by Google, Facebook, Bank of America, etc.
• 2 million users worldwide
• Downloaded URL:
http://www.r-project.org
@ctjava#r+java
What is R?• R Foundation responsible for R.• Sponsored/supported by industry.• Licensed under GPL.• Implementation of the S programming language• Name derived from author’s of R.• First implementation ~1997• Written in C, Fortran, and R
@ctjava#r+java
CRAN• Power of R is packages!• CRAN = Comprehensive R Archive Network• Analogous to (Maven) Central• 6745 packages available
• Database access• Data manipulation• Visualization• Data modeling• Reports• Geospatial data analysis• Time series/financial data
@ctjava#r+java
CRAN Popular Packages• ggplot2 – package for creating graphs• rgl – interactive 3D visualizations• Caret – training regression• Survival – tools for survival analysis• Mgcv – generalized additive models• Maps – polygons for plots• Ggmap – Google maps• Xts – manipulates time series data• Quantmode – downloads financial data, plotting, charting• tidyr – changes layout of datasets
@ctjava#r+java
Uses of R
Calculating Credit Risk
Reporting
Data Analysis Data Visualization
Data Exploration
Clinical Research
Flood ForecastingServer Failure
Modeling
@ctjava#r+java
Why not Java?• Java isn’t “convenient”• Lacks specialized data structures• Limited graphing capabilities• Few statistical libraries available• Statisticians don’t use Java• No interactive tools for data exploration• No built-in support for data import/cleanup• Re-inventing the wheel is expensive…
R is a DSL + Stat Library
@ctjava#r+java
Leveraging R from Java• Two approaches to integration:
• rJava – access R from Java• JRI – call Java from R
• rJava includes JRI.• Installed from CRAN: install.packages(‘rJava’)• Documentation & code:
• http://www.rforge.net/rJava/• https://github.com/s-u/rJava
• R & Java worlds bridged via JNI
@ctjava#r+java
Getting Started with R• Download and install:
• Rhttp://www.r-project.org
• R Studio:http://www.rstudio.com
@ctjava#r+java
Basics of R• Interpreted language• Functional• Dynamic typing• Lexical scoping• R scripts stored in “.R” files• Run R commands interactively in R/R Studio or RScript.• Language
• Object-oriented• Exceptions• Debugging
@ctjava#r+java
R Data Types• Scalar
• Numeric• Decimal• Integer
• Character• Logical – true or false
• Vectors – a sequence of numbers or characters, or higher-dimensional arrays like matrices
• Factors – sequence assigning a category to each index• Lists – collection of objects• Data frames – table-like structure
@ctjava#r+java
NULL & NA• NULL – indicates an object is absent• NA – missing values (Not Available)
@ctjava#r+java
Language Basics• # Comments• Assignment “<-” but “=“ can also be used• Variables rules:
• Letters, numbers, dot (.), underscore (_)• Can start with a letter or a dot but not followed by a number• Valid
.test_testtesttest.today
• Invalid.2test_test_2test
@ctjava#r+java
Vectors• Defining and assigning a vector:
> x <- c(10,20,30,40,50,60)• Multiplying a vector:
> x * 3[1] 30 , 60, 90, 120, 150, 180
• Applying a function to a vector:> sqrt(x)[1] 3.162278 4.472136 5.477226 6.324555 7.071068…
• Access individual elements:> x[1][1] 30
• Appending data to a vector:> x <- c(x,70)[1] 10 20 30 40 50 60 70
@ctjava#r+java
Data Frames• Setup the data for the frame:
boats <- c("Bayou Blue", "Pachyderm", "Spectre" , "Flatline")model <- c("J30" , "Frers 33", "J-125" , "Evelyn 32-2")phrf <- c(135, 108 , -6, 99)finish <- times(c( "19:53:06" , "19:42:18" , "19:38:11" , "19:45:48" ))kts <- c(4.09 , 4.66 , 4.92 , 4.46)
• Construct the data frame:raceDF <- data.frame(boats,model,phrf,finish,kts)
@ctjava#r+java
Data Frames> summary(raceDF) boats model phrf finish kts Bayou Blue:1 Evelyn 32-2:1 Min. : -6.00 Min. :19:38:11 Min. :4.090 Flatline :1 Frers 33 :1 1st Qu.: 72.75 1st Qu.:19:41:16 1st Qu.:4.367 Pachyderm :1 J-125 :1 Median :103.50 Median :19:44:03 Median :4.560 Spectre :1 J30 :1 Mean : 84.00 Mean :19:44:51 Mean :4.532 3rd Qu.:114.75 3rd Qu.:19:47:37 3rd Qu.:4.725 Max. :135.00 Max. :19:53:06 Max. :4.920
@ctjava#r+java
Lists• Generic Vector containing other objects• Example:
wkDays <- c("Monday","Tuesday","Wednesday","Thursday","Friday")dts <- c(15,16,17,18,19)devoxx <- c(FALSE,FALSE,TRUE,TRUE,TRUE)weekSch <- list(wkDays,dts,devoxx)
@ctjava#r+java
Lists• Member slicing:
> weekSch[1][[1]][1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday"
• Member referencing:> weekSch[[1]][1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday”
• Labeling entries:> names(weekSch) <- c("Days","Dates","Devoxx Events")
@ctjava#r+java
Matrices• Defining a matrix:
myMatrix <- matrix(1:10 , nrow = 2) [,1] [,2] [,3] [,4] [,5][1,] 1 3 5 7 9[2,] 2 4 6 8 10
• Printing out dimensions:> dim(myMatrix)[1] 2 5
• Multiplying matrixes:> myMatrix + myMatrix
[,1] [,2] [,3] [,4] [,5][1,] 2 6 10 14 18[2,] 4 8 12 16 20
@ctjava#r+java
Factors• Vector whose elements can take on one of a specific set of values.• Used in statistical modeling to assign the correct number of degrees
of freedom.> factor(x=c("High School","College","Masters","Doctorate"), levels=c("High School","College","Masters","Doctorate"), ordered=TRUE)[1] High School College Masters Doctorate Levels: High School < College < Masters < Doctorate
@ctjava#r+java
Defining Functions• Created using function() directive.• Stored as objects of class function.
F <- function(<arguments>) {# do something
}• Functions can be passed as arguments.• Functions can be nested in other functions.• Return value is the last expression to be evaluated.• Functions can take an arbitrary number of arguments.• Example:
double.num <- function(x) {x * 2
}
@ctjava#r+java
Built-in Datasetsdata()
@YourTwitterHandle@ctjava#r+java
Dem
o
@ctjava#r+java
Review: Linear RegressionLinear regression model: a type of regression model, in which the
response is continuous variable, and is linearly related with the predictor variable(s).
@ctjava#r+java
Review: Linear RegressionWhat can a linear regression do?• Find linear relationship between height and weight.• Predict a person's weight based on his/ her height.Example:
Given the observations, weight (Y) and height (X), the parameters in the model can be estimated.
response intercept coefficientpredictor
error
Assumptions of the linear regression model: 1) the errors have constant variance2) the errors have zero mean3) the errors come from the same normal distribution
@ctjava#r+java
Review: Linear Regression
@ctjava#r+java
Review: Linear Regression
@ctjava#r+java
Review: Linear Regression
Setup the data…
@ctjava#r+java
Review: Linear Regression
Perform the linear regression…
@ctjava#r+java
Review: Linear Regression
Plot the results…
@ctjava#r+java
Considerations1. Do you want to re-implement that logic in Java?2. How would you test your implementation?3. What would the ramifications of incorrect calculations?
@ctjava#r+java
R + Java = rJava
• rJava provides a Java API to R.• JRI – ability to call from R back into Java code.• Runs R inside of the JVM process via JNI.• Single-threaded – R can be accessed ONLY by one thread!• Native library can be loaded only ONCE.
@ctjava#r+java
<dependency><groupId>org.nuiton.thirdparty</groupId><artifactId>JRI</artifactId><version>0.9-6</version></dependency>
rJava and Maven
@ctjava#r+java
Configuring Project (non-Maven/SE)
Folder containing JNI
library
• Use R.home() to locate the installation directory.
• rJava under library/rJava
@ctjava#r+java
Runtime Parameters
-DR_HOME -Djava.library.path-Denv.R_HOME
@ctjava#r+java
Starting R
• Interact with R via Rengine.• Initialize Rengine with instance of RMainLoopCallbacks.
@ctjava#r+java
Simple rJava Example
@ctjava#r+java
Advanced rJava Example
@ctjava#r+java
R Scripts
Wait – I have to embed all of my R code in Java??
@ctjava#r+java
Java EE + R
JSR 352 - Batching
@ctjava#r+java
Java EE Container Integration
• Add following libraries to container lib: (glassfish4/glassfish/domains/<domain>/lib)• JRI.java• JRIEngine.jar• Libjri.jnilib native code!• Rengine.jar
Do NOT include rJava dependencies in your WAR/EAR!
@ctjava#r+java
Java EE Container Integration
@ctjava#r+java
JSR 352 Basic Concepts
Job Operator
Job Step
Job Repository
ItemReader
ItemProcessor
ItemWriter
Batchlet
@ctjava#r+java
JSR 353 Basic Concepts
• Job – encapsulates the entire batch process.• JobInstance – actual execution of a job.• JobParameters – parameters passed to a job.• Step – encapsulates an independent, sequential phase of a batch
job.• Batch checkpoints:
• Bookmarking of progress so that a job can be restarted. • Important for long running jobs
@ctjava#r+java
JSR 352 Basic Concepts
• Step Models:• Chunk – comprised of Reader/Writer/Procesor• Batchlet – task oriented step (file transfer etc.)
• Partitioning – mechanism for running steps in parallel• Listeners – provide life-cycle hooks
@ctjava#r+java
Initializing R in Singleton Bean
@ctjava#r+java
Example: Road Race Statistics
@ctjava#r+java
Example Batch Job: 5k Racing
Process overview• ResultRetrieverBatchlet – Downloads data raw data from website.• RaceResultsReader – Extracts individual runners from the raw data.• RaceResultsProcessor – Parses a runner’s results.• RaceResultsWriter – Writes the statistics to the database.• RaceAnalysisBatchlet – Uses R to analyze race results.Notes:• JAX-RS used to retrieve the results from the website.• JPA to persist the results.• R script extracts the results from PostgeSQL (not passed in)
@ctjava#r+java
Example Batch Job: 5k Racing
@ctjava#r+java
Example Batch Job: 5k Racing
@ctjava#r+java
Example Batch Job: 5k Racing
@ctjava#r+java
Example Batch Job: 5k Racing
@ctjava#r+java
Challeges
• R can be memory hog!• Crashes takes down R + Java + Container!• Solution: R scripts ‘externally’• Note: plotting requires X!
@YourTwitterHandle#DVXFR14{session hashtag} @ctjava#r+java
Sum
mar
y
@YourTwitterHandle#DVXFR14{session hashtag} @ctjava#r+java
Q &
A
Questions
@YourTwitterHandle#DVXFR14{session hashtag} @ctjava#r+java
[email protected] (Java)[email protected] (Stats)@ctjava
Top Related