Open Source @ IBM R Community Update...Open Source @ IBM R Community Update Augustina Ragwitz, IBM...
Transcript of Open Source @ IBM R Community Update...Open Source @ IBM R Community Update Augustina Ragwitz, IBM...
Open Source @ IBMR Community UpdateAugustina Ragwitz, IBM Cognitive Open Tech
August, 2017
2
Today's Agenda
• What is R?
• How was R created?
• Where is the R community?
• Technical overview
• What's the current status of R?
• What is next for the R community?
• R at IBM
• Let's Get Started
• Call to Action
3
What is R?
• Free Open Source alternative to Stata, Matlab, SPSS, and SAS
• Preferred by students because it is free; enter workforce alreadytrained in it (Fast Company, 2014)
• Developed by and for researchers and analysts that do not havea traditional programming background
• Easy to use; low overhead to get up and running
• Extensible through packages via CRAN, BioConductor, andROpenSci.
R is a programming environment for statistical analysis + graphics
4
How was R created?
• Ross Ihaka and Robert Gentleman(University of Auckland, NZ) in 1992
• First Stable beta: 2000
• Annual x.y.0 releases in Spring• Patches released as needed (x.y.z)
• Final patch release of previous versionjust the new one
• Current major version: 3.0.0
• Learn more about R core Internals: https://cran.r-project.org/doc/manuals/r-release/R-ints.html
Created by Statisticians for Statisticians
5
Where is the R community?
• CRAN – R Package Repository• https://cran.r-project.org/• User-submitted R code to
extend the R language
• R Foundation• https://www.r-project.org/foundation/• Support R community
• R Consortium• https://www.r-consortium.org/• Founded in 2015• Bridge Community and Enterprise
Interests• Platinum Companies include
IBM, Microsoft, RStudio• IBM: Board + Steering and
Marketing Committees
6
R: Analyst Technical Overview
Data gathering to analysis to publishing streamlined!
• Convert unstructured datainto tables (readr, tidyr)
• One-liner statisticalanalysis (dplyr)
• Easy data visualization(ggplot2)
• Reactive JavaScript appgenerated from R code (shiny)
• Publish research + code inHTML, PDF, and otherformats (rmarkdown, knitr)
# gathermy_data <- read_csv("my_data.csv")df <- as_data_frame(my_data)
# summarizedf <- df %>%filter(!is.na(name)) %>%separate(name, c("last", "first"), sep=",") %>%group_by(last) %>%summarise(total=n())
7
R: Developer Technical Overview
Integrating R into Production Workflows
Rserve provides a socket interface to existing applications> install.packages("Rserve")> library(Rserve)> Rserve()
Plumber generates API endpoints from R code for Rserve#* @post /sumaddTwo <- function(a, b){ as.numeric(a) + as.numeric(b) }
https://www.rplumber.io/
8
What is the current status of R? Over 11,000 packages on CRAN!
Most popular open source tool in academic research
Top language among industry data scientists
9
What's next for the R community?
• Improve Tooling and Support• Code Coverage• RHub (hosted testing + validation of R packages)
• Big Data and Cloud Improvements• Unified Framework/API for Distributed Computing• Better database integration via DBI• Support scalable Spatiotemporal/raster datasets
• Community Training and Outreach• Software Carpentry/Data Carpentry workshops• Support R User Groups (RUGs)• Diversity Initiatives (R-Ladies)
10
R at IBM
• Learn R and Data Science through CognitiveClass.AI• Data Science with R: https://cognitiveclass.ai/learn/data-science-r/
• Data Science Experience + R• RStudio: https://datascience.ibm.com/docs/content/analyze-data/rstudio-overview.html
• R + Watson Natural Language Understanding(NLU): https://apsportal.ibm.com/exchange/public/entry/view/1015c435b898fb629a7e7523be151aed
• DeveloperWorks Code• https://developer.ibm.com/code/patterns/detect-change-points-in-iot-sensor-data/
• https://developer.ibm.com/code/patterns/category/data-science/
11
R: Let's Get Started
• Install
• CRAN - https://cran.r-project.org/
• RStudio - https://www.rstudio.com/
• Learn
• Install the swirl package - http://swirlstats.com/
• R for Data Science by Hadley Wickham - http://r4ds.had.co.nz/
• Statistics for R course on Coursera (free to audit) - https://www.coursera.org/specializations/statistics
• Explore
• Big Data Analysis + Machine Learning with R + Apache Spark
• R4ML: https://www.ibm.com/support/knowledgecenter/en/SSPT3X_4.2.5/com.ibm.swg.im.infosphere.biginsights.tut.doc/doc/tut_Mod_R4ML.html
• Data Science for Automotive Lab (R in Jupyter Notebook)
• https://github.com/kurlare/DSforAutomotive
• Find a RUG Meetup• https://www.meetup.com/topics/r-project-for-statistical-computing/
• Attend an R conference• useR, EARL, RStudio::conf, Open Data Science West/East
• Use Twitter hashtag #rstats
• Join a Mailing List• https://www.r-project.org/mail.html
• Submit a community proposal• https://www.r-consortium.org/projects/call-for-proposals
• Join a Working Group• https://www.r-consortium.org/projects/isc-working-groups
• Subscribe to the IBM Code monthly newsletter (hotlink:https://www.pages03.net/ibmdeveloperworks/developerWorks-IBMCodeNewsletterSubscriptionPage-secure/)
• Subscribe to future Code Tech Talks (hotlink:https://www.pages03.net/ibmdeveloperworks/developerWorks-IBMCodeTechTalkSubscriptionPage-secure/)
Call to Action
Q & A
13