Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

58
@ctjava #r+java Combining R with Java Ryan Cuprak Elsa Cuprak @ctjava cuprak.info

Transcript of Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

Page 1: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Combining R with Java

Ryan CuprakElsa Cuprak@ctjava cuprak.info

Page 2: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Combining R with Java

Page 3: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Agenda

R Overvie

w

R + Java

R + Java EE

Page 4: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

What is R?• Free open-source alternative to Matlab, SAS, Excel, and SPSS

• R is:

• Statistical software

• Language

• Environment

• Ecosystem

• Used by Google, Facebook, Bank of America, etc.

• 2 million users worldwide

• Downloaded URL:

http://www.r-project.org

Page 5: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

What is R?• R Foundation responsible for R.• Sponsored/supported by industry.• Licensed under GPL.• Implementation of the S programming language• Name derived from author’s of R.• First implementation ~1997• Written in C, Fortran, and R

Page 6: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

CRAN• Power of R is packages!• CRAN = Comprehensive R Archive Network• Analogous to (Maven) Central• 6745 packages available

• Database access• Data manipulation• Visualization• Data modeling• Reports• Geospatial data analysis• Time series/financial data

Page 7: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

CRAN Popular Packages• ggplot2 – package for creating graphs• rgl – interactive 3D visualizations• Caret – training regression• Survival – tools for survival analysis• Mgcv – generalized additive models• Maps – polygons for plots• Ggmap – Google maps• Xts – manipulates time series data• Quantmode – downloads financial data, plotting, charting• tidyr – changes layout of datasets

Page 8: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Uses of R

Calculating Credit Risk

Reporting

Data Analysis Data Visualization

Data Exploration

Clinical Research

Flood ForecastingServer Failure

Modeling

Page 9: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Why not Java?• Java isn’t “convenient”• Lacks specialized data structures• Limited graphing capabilities• Few statistical libraries available• Statisticians don’t use Java• No interactive tools for data exploration• No built-in support for data import/cleanup• Re-inventing the wheel is expensive…

R is a DSL + Stat Library

Page 10: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Leveraging R from Java• Two approaches to integration:

• rJava – access R from Java• JRI – call Java from R

• rJava includes JRI.• Installed from CRAN: install.packages(‘rJava’)• Documentation & code:

• http://www.rforge.net/rJava/• https://github.com/s-u/rJava

• R & Java worlds bridged via JNI

Page 11: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Getting Started with R• Download and install:

• Rhttp://www.r-project.org

• R Studio:http://www.rstudio.com

Page 12: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Basics of R• Interpreted language• Functional• Dynamic typing• Lexical scoping• R scripts stored in “.R” files• Run R commands interactively in R/R Studio or RScript.• Language

• Object-oriented• Exceptions• Debugging

Page 13: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

R Data Types• Scalar

• Numeric• Decimal• Integer

• Character• Logical – true or false

• Vectors – a sequence of numbers or characters, or higher-dimensional arrays like matrices

• Factors – sequence assigning a category to each index• Lists – collection of objects• Data frames – table-like structure

Page 14: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

NULL & NA• NULL – indicates an object is absent• NA – missing values (Not Available)

Page 15: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Language Basics• # Comments• Assignment “<-” but “=“ can also be used• Variables rules:

• Letters, numbers, dot (.), underscore (_)• Can start with a letter or a dot but not followed by a number• Valid

.test_testtesttest.today

• Invalid.2test_test_2test

Page 16: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Vectors• Defining and assigning a vector:

> x <- c(10,20,30,40,50,60)• Multiplying a vector:

> x * 3[1] 30 , 60, 90, 120, 150, 180

• Applying a function to a vector:> sqrt(x)[1] 3.162278 4.472136 5.477226 6.324555 7.071068…

• Access individual elements:> x[1][1] 30

• Appending data to a vector:> x <- c(x,70)[1] 10 20 30 40 50 60 70

Page 17: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Data Frames• Setup the data for the frame:

boats <- c("Bayou Blue", "Pachyderm", "Spectre" , "Flatline")model <- c("J30" , "Frers 33", "J-125" , "Evelyn 32-2")phrf <- c(135, 108 , -6, 99)finish <- times(c( "19:53:06" , "19:42:18" , "19:38:11" , "19:45:48" ))kts <- c(4.09 , 4.66 , 4.92 , 4.46)

• Construct the data frame:raceDF <- data.frame(boats,model,phrf,finish,kts)

Page 18: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Data Frames> summary(raceDF) boats model phrf finish kts Bayou Blue:1 Evelyn 32-2:1 Min. : -6.00 Min. :19:38:11 Min. :4.090 Flatline :1 Frers 33 :1 1st Qu.: 72.75 1st Qu.:19:41:16 1st Qu.:4.367 Pachyderm :1 J-125 :1 Median :103.50 Median :19:44:03 Median :4.560 Spectre :1 J30 :1 Mean : 84.00 Mean :19:44:51 Mean :4.532 3rd Qu.:114.75 3rd Qu.:19:47:37 3rd Qu.:4.725 Max. :135.00 Max. :19:53:06 Max. :4.920

Page 19: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Lists• Generic Vector containing other objects• Example:

wkDays <- c("Monday","Tuesday","Wednesday","Thursday","Friday")dts <- c(15,16,17,18,19)devoxx <- c(FALSE,FALSE,TRUE,TRUE,TRUE)weekSch <- list(wkDays,dts,devoxx)

Page 20: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Lists• Member slicing:

> weekSch[1][[1]][1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday"

• Member referencing:> weekSch[[1]][1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday”

• Labeling entries:> names(weekSch) <- c("Days","Dates","Devoxx Events")

Page 21: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Matrices• Defining a matrix:

myMatrix <- matrix(1:10 , nrow = 2) [,1] [,2] [,3] [,4] [,5][1,] 1 3 5 7 9[2,] 2 4 6 8 10

• Printing out dimensions:> dim(myMatrix)[1] 2 5

• Multiplying matrixes:> myMatrix + myMatrix

[,1] [,2] [,3] [,4] [,5][1,] 2 6 10 14 18[2,] 4 8 12 16 20

Page 22: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Factors• Vector whose elements can take on one of a specific set of values.• Used in statistical modeling to assign the correct number of degrees

of freedom.> factor(x=c("High School","College","Masters","Doctorate"), levels=c("High School","College","Masters","Doctorate"), ordered=TRUE)[1] High School College Masters Doctorate Levels: High School < College < Masters < Doctorate

Page 23: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Defining Functions• Created using function() directive.• Stored as objects of class function.

F <- function(<arguments>) {# do something

}• Functions can be passed as arguments.• Functions can be nested in other functions.• Return value is the last expression to be evaluated.• Functions can take an arbitrary number of arguments.• Example:

double.num <- function(x) {x * 2

}

Page 24: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Built-in Datasetsdata()

Page 25: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@YourTwitterHandle@ctjava#r+java

Dem

o

Page 26: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Review: Linear RegressionLinear regression model: a type of regression model, in which the

response is continuous variable, and is linearly related with the predictor variable(s).

Page 27: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Review: Linear RegressionWhat can a linear regression do?• Find linear relationship between height and weight.• Predict a person's weight based on his/ her height.Example:

Given the observations, weight (Y) and height (X), the parameters in the model can be estimated.

response intercept coefficientpredictor

error

Assumptions of the linear regression model: 1) the errors have constant variance2) the errors have zero mean3) the errors come from the same normal distribution

Page 28: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Review: Linear Regression

Page 29: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Review: Linear Regression

Page 30: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Review: Linear Regression

Setup the data…

Page 31: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Review: Linear Regression

Perform the linear regression…

Page 32: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Review: Linear Regression

Plot the results…

Page 33: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Considerations1. Do you want to re-implement that logic in Java?2. How would you test your implementation?3. What would the ramifications of incorrect calculations?

Page 34: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

R + Java = rJava

• rJava provides a Java API to R.• JRI – ability to call from R back into Java code.• Runs R inside of the JVM process via JNI.• Single-threaded – R can be accessed ONLY by one thread!• Native library can be loaded only ONCE.

Page 35: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

<dependency><groupId>org.nuiton.thirdparty</groupId><artifactId>JRI</artifactId><version>0.9-6</version></dependency>

rJava and Maven

Page 36: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Configuring Project (non-Maven/SE)

Folder containing JNI

library

• Use R.home() to locate the installation directory.

• rJava under library/rJava

Page 37: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Runtime Parameters

-DR_HOME -Djava.library.path-Denv.R_HOME

Page 38: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Starting R

• Interact with R via Rengine.• Initialize Rengine with instance of RMainLoopCallbacks.

Page 39: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Simple rJava Example

Page 40: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Advanced rJava Example

Page 41: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

R Scripts

Wait – I have to embed all of my R code in Java??

Page 42: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Java EE + R

JSR 352 - Batching

Page 43: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Java EE Container Integration

• Add following libraries to container lib: (glassfish4/glassfish/domains/<domain>/lib)• JRI.java• JRIEngine.jar• Libjri.jnilib native code!• Rengine.jar

Do NOT include rJava dependencies in your WAR/EAR!

Page 44: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Java EE Container Integration

Page 45: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

JSR 352 Basic Concepts

Job Operator

Job Step

Job Repository

ItemReader

ItemProcessor

ItemWriter

Batchlet

Page 46: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

JSR 353 Basic Concepts

• Job – encapsulates the entire batch process.• JobInstance – actual execution of a job.• JobParameters – parameters passed to a job.• Step – encapsulates an independent, sequential phase of a batch

job.• Batch checkpoints:

• Bookmarking of progress so that a job can be restarted. • Important for long running jobs

Page 47: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

JSR 352 Basic Concepts

• Step Models:• Chunk – comprised of Reader/Writer/Procesor• Batchlet – task oriented step (file transfer etc.)

• Partitioning – mechanism for running steps in parallel• Listeners – provide life-cycle hooks

Page 48: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Initializing R in Singleton Bean

Page 49: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Example: Road Race Statistics

Page 50: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Example Batch Job: 5k Racing

Process overview• ResultRetrieverBatchlet – Downloads data raw data from website.• RaceResultsReader – Extracts individual runners from the raw data.• RaceResultsProcessor – Parses a runner’s results.• RaceResultsWriter – Writes the statistics to the database.• RaceAnalysisBatchlet – Uses R to analyze race results.Notes:• JAX-RS used to retrieve the results from the website.• JPA to persist the results.• R script extracts the results from PostgeSQL (not passed in)

Page 51: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Example Batch Job: 5k Racing

Page 52: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Example Batch Job: 5k Racing

Page 53: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Example Batch Job: 5k Racing

Page 54: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Example Batch Job: 5k Racing

Page 55: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@ctjava#r+java

Challeges

• R can be memory hog!• Crashes takes down R + Java + Container!• Solution: R scripts ‘externally’• Note: plotting requires X!

Page 56: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@YourTwitterHandle#DVXFR14{session hashtag} @ctjava#r+java

Sum

mar

y

Page 57: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@YourTwitterHandle#DVXFR14{session hashtag} @ctjava#r+java

Q &

A

Questions

Page 58: Combining R With Java For Data Analysis (Devoxx UK 2015 Session)

@YourTwitterHandle#DVXFR14{session hashtag} @ctjava#r+java

[email protected] (Java)[email protected] (Stats)@ctjava