12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

download 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

of 33

Transcript of 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    1/33

    Introducing

    Revolution R Open

    The Enhanced R Distribution

    November 12, 2014

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    2/33

    In todays webinar:

    R Update

    Revolution R OpenThe Reproducible R Toolkit

    MRAN

    Other open-source projects

    DeployR Open

    ParallelR

    Rhadoop

    Revolution R Plus

    Q&A

    David SmithChief Community Officer

    Revolution Analytics

    @[email protected]

    Editor, blog.revolutionanal

    Co-author, Introduction to

    http://blog.revolutionanalytics.com/http://blog.revolutionanalytics.com/
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    3/33

    OUR COMPANY

    The leading provider

    of advanced analytics

    software and services

    based on open source R,since 2007

    OUR PRODUCT

    REVOLUTION R: The

    enterprise-grade predictive

    analytics application platform

    based on the R language

    SOME KU

    Visiona

    Gartner Magic

    for Advanced A

    Platforms,

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    4/33

    What is R? Most widely used data analysis software

    Used by 2M+ data scientists, statisticians and analysts

    Most powerful statistical programming language Flexible, extensible and comprehensive for productivity

    Create beautiful and unique data visualizationsAs seen in New York Times, Twitter and Flowing Data

    Thriving open-source community Leading edge of analytics research

    Fills the Data Science talent gap New graduates prefer R

    www.revolutionanalytics.com

    http://www.revolutionanalytics.com/what-r
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    5/33

    Poll #1

    What software do you use for statistical analysis? (Select all that a

    R

    SAS

    SPSS

    Python

    Other

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    6/33

    Rs popularity is growing rapidlyMore at blog.revolutionanalytics.com/popularity

    R Usage GrowthRexer Data Miner Survey, 2007-2013

    Rexer Data Miner Survey IEEE Spectrum, July 2014

    Language PopularityIEEE Spectrum Top Programming Lan

    http://blog.revolutionanalytics.com/popularity/http://blog.revolutionanalytics.com/2013/10/r-usage-skyrocketing-rexer-poll.htmlhttp://spectrum.ieee.org/static/interactive-the-top-programming-languages#indexhttp://spectrum.ieee.org/static/interactive-the-top-programming-languages#indexhttp://blog.revolutionanalytics.com/2013/10/r-usage-skyrocketing-rexer-poll.htmlhttp://blog.revolutionanalytics.com/popularity/http://blog.revolutionanalytics.com/popularity/
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    7/33

    Revolution R Open is:

    Enhanced Open Source R distribution Compatible with all R-related software

    Multi-threaded for performance

    Focus on reproducibility

    Open source (GPLv2 license)

    Available for Windows, Mac OS X, Ubuntu,Red Hat and OpenSUSE

    Download frommran.revolutionanalytics.com

    http://mran.revolutionanalytics.com/downloadhttp://mran.revolutionanalytics.com/open/http://mran.revolutionanalytics.com/download
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    8/33

    Multi-threaded performance Intel MKL replaces standard

    BLAS/LAPACK algorithms

    Pipelined operations Optimized for Intel, works for all archs

    High-performance algorithms

    Sequential Parallel

    Uses as many threads as there areavailable cores

    Control with:setMKLthreads()

    No need to change any R code

    Included in RRO binary distribution

    More at Revo

    http://blog.revolutionanalytics.com/2014/10/revolution-r-open-mkl.htmlhttp://blog.revolutionanalytics.com/2014/10/revolution-r-open-mkl.html
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    9/33

    100% Compatibility Built on latest R engine

    Currently R 3.1.1, R 3.1.2 in testing

    100% compatible with R scripts

    R packages

    Applications with R connections

    Designed to work with Rstudio

    No configuration required Replaces existing R application

    Side-by-side installations

    http://mran.revolutionanalytics.com/documents/rro/installation/#revorinst-sidebysidehttp://mran.revolutionanalytics.com/documents/rro/installation/#revorinst-sidebysidehttp://mran.revolutionanalytics.com/documents/rro/installation/#revorinst-sidebysidehttp://mran.revolutionanalytics.com/documents/rro/installation/#revorinst-sidebysidehttp://mran.revolutionanalytics.com/documents/rro/installation/#revorinst-sidebysidehttp://mran.revolutionanalytics.com/documents/rro/installation/#revorinst-sidebyside
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    10/33

    Reproducibilitywhy do we care?Academic / Research

    Verify results

    Advance Research

    Business

    Production code

    Reliability

    Reusability

    Collaboration

    Regulation

    www.nytimes.com/2011/07/08/health/rhttp://arxiv

    http://www.nytimes.com/2011/07/08/health/research/08genes.htmlhttp://www.nytimes.com/2011/07/08/health/research/08genes.htmlhttp://www.nytimes.com/2011/07/08/health/research/08genes.html
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    11/33

    An R Reproducibility Problem

    Adapted from http://xkcd.com/234/CC BY-NC 2.5

    http://xkcd.com/234/http://xkcd.com/234/http://xkcd.com/234/http://xkcd.com/234/
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    12/33

    Reproducible R Toolkit in RRO Static CRAN mirror

    CRAN packages fixed with each Revolution R Open update

    Daily CRAN snapshots Storing every package version since September 2014

    Binaries and sources

    At mran.revolutionanalytics.com/snapshot

    Easily write and share scripts synced to a specific snapshot checkpoint package installed with RRO

    CRAN

    RRDailysnapshots

    http://mran.revolutionanalytics.com/snapshot/

    chepa

    library(ch

    checkpoin

    CRAN mirror

    http://cran.revolutionanalytics.com/

    checkpointserver

    MidnightUTC

    http://mran.revolutionanalytics.com/snapshothttp://mran.revolutionanalytics.com/snapshot
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    13/33

    Using checkpoint Easy to use: add 2 lines to the top of each script

    library(checkpoint)

    checkpoint("2014-09-17")

    For the package author:

    Use package versions available on the chosen date

    Installs packages local to this project

    Allows different package versions to be used simultaneously

    For a script collaborator: Automatically installs required packages

    Detects required packages (no need to manually install!)

    Uses same package versions as script author to ensure reproducibili

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    14/33

    MRAN: The Managed R Archive Network Download Revol

    Open

    Learn about R a Daily CRAN sna

    Explore Package

    and dependen

    Explore Task Vie

    http://mran.revolutionanalytics.com/
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    15/33

    Revolution AnalyticsOpen Source Projects

    More at projects.revolutionanalytics.com

    http://projects.revolutionanalytics.com/http://projects.revolutionanalytics.com/
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    16/33

    DeployR Open Goal: embed results from R scripts into

    existing applications, in real time

    Problem:

    Exposing arbitrary R functions is unwise

    Need to handle concurrent R sessions

    Solution: DeployR Open

    R, on a server, behind a firewall

    Repository Manager defines entry points

    Expose only authorized R functions

    Automatically creates Web Services APIs Manages and monitors pool of R sessions

    Separates roles for R and app developer

    DeployR Open: for prototypingintegrations

    Revolution R Enterprise adds grid-scaling andenterprise authentication

    More at deployr.revolutiona

    http://deployr.revolutionanalytics.com/http://deployr.revolutionanalytics.com/
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    17/33

    DeployR : IntegrationDeployR does not provide any application UI.

    3 integration modes embedreal-time R results into existinginterfaces

    Web app, mobile app, desktop app, BI tool, Excel,

    RBroker Framework(tutorial):

    Simple, high-performance API for Java, .NET and Javascript apps

    Supports transactional, on-demand analytics on a stateless R session

    Client Libraries (tutorial):

    Flexible control of R services from Java, .NET and Javascript appsAlso supports stateful R integrations (e.g. complex GUIs)

    DeployR Web Services API:

    Integrate R using almost any client languages

    http://deployr.revolutionanalytics.com/documents/dev/rbroker/http://deployr.revolutionanalytics.com/documents/dev/clientlib/http://deployr.revolutionanalytics.com/documents/dev/clientlib/http://deployr.revolutionanalytics.com/documents/dev/rbroker/
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    18/33

    Only available in Revolution R Enterprise DeployR

    DeployR : Security / Scalability Layers1. Anonymous execution

    Only authorized, user-defined R functions accessible

    No state preserved2. Basic username / password authentication

    Managed in DeployR Administration Console

    3. Enterprise Authentication

    Verifies identify with SSO / LDAP / Active Directory / PAM4. Adaptive load-balancing grid

    Ensures service availability

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    19/33

    DeployR Open demo

    Fraud detection

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    20/33

    RHadoop and ParallelR Toolkits for data scientists and numerical analysts to create custo

    parallel and distributed algorithms

    ParallelR: parallel programming for multi-CPU servers and grids RHadoop: map-reduce programming in R language

    Mainly useful for embarrassingly parallel problems, where para

    components work with small amounts of data

    Big Data Predictive Analytics mostly not embarrassingly parallel 80+ pre-built parallel external memory algorithms included with

    Revolution R Enterprise

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    21/33

    RHadoop Collection of packages for interfacing R and Hadoop

    Client (desktop) R interface to Hadoop: rhdfs: Browse, read, write and modify files stored in HDFS

    rhbase: Browse, read, write and modify tables stored in HBASE

    ravro: Read, write and run map-reduce on Apache Avro files in HDFS

    R computations in Hadoop:

    rmr2: write map-reduce tasks in R to run in Hadoop

    plyrmr: R-based data manipulation computations on data in Hadoop

    RHadoop Wiki: github.com/RevolutionAnalytics/RHadoop/wiki

    https://github.com/RevolutionAnalytics/RHadoop/wikihttps://github.com/RevolutionAnalytics/RHadoop/wikihttps://github.com/RevolutionAnalytics/RHadoop/wikihttps://github.com/RevolutionAnalytics/RHadoop/wikihttps://github.com/RevolutionAnalytics/RHadoop/wikihttps://github.com/RevolutionAnalytics/RHadoop/wikihttps://github.com/RevolutionAnalytics/RHadoop/wiki
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    22/33

    Word count in RHadoop Map:

    Input: lines of text

    Output: words with key value 1

    Reduce: Input: Words with several key values

    Output: words with counts

    Map-Reduce: Apply map to lines of text

    Gather like words together and count

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    23/33

    Word count: execution

    More: Video replay of Using R

    Hadoop by Jeffrey Breen

    http://bit.ly/W35PLR

    http://bit.ly/W35PLRhttp://bit.ly/W35PLRhttp://bit.ly/W35PLRhttp://bit.ly/W35PLR
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    24/33

    ParallelR

    foreachreplaces forloops

    Minimal code change required

    Choice of parallel backends

    doParallel (base parallel)

    doMC (multi-core servers)

    doSNOW (grids)

    Iterations run in parallel Speedups depend on backend,

    granularity

    All iterations run in-memory

    birthday

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    25/33

    Introducing

    Revolution R Plus

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    26/33

    Revolution R Plus includes: AdviseR Technical Support for:

    Revolution R Open

    Including R, base and recommended packages Reproducible R Toolkit

    ParallelR: Parallel programming with R

    RHadoop: R integration with Hadoop

    DeployR Open: Secure deployment of R to applications

    Open Source Assurance for all supported components Provides legal indemnity for subscribers

    Workstation subscriptions: $1,800 per year

    Server and Hadoop subscriptions also available

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    27/33

    AdviseR Technical SupportTechnical support for R, from the R experts.

    10x5 email and phone support (in your local time zone)

    Full support for R, validated packages, and third-party softwareconnections

    Notifications of updates and bug fixes

    On-line case management and knowledgebase

    Access to technical resources, documentation and user forums

    Defined service-level agreements for rapid responses

    Included with Revolution R Plus and Revolution R Enterprise.

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    28/33

    Open Source Assurance Revolution Analytics will defend Revolution R Plus subscribers shoul

    third party make an intellectual property claim against covered opensource software with respect to:

    copyrights, patents, trademarks, trade secrets

    Covered software includes:

    Revolution R Open (incl. R base and recommended packages), ReproducToolkit, DeployR Open, ParallelR, RHadoop

    Revolution Analytics will defend open source software in court

    If necessary, Revolution Analytics will obtain rights, modify, or replace softwfound to be infringing

    If a resolution cant be found, fees paid in past 12 months will be refunded

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    29/33

    The Revolution R Product Suite

    Free and open source R distribution Enhanced and distributed by Revolution Analytics

    Revolution R Open

    Open-source distribution of R, packages, and other components Enhanced, supported and indemnified by Revolution Analytics

    Revolution R Plus

    Secure, Scalable and Supported Distribution of R With proprietary components created by Revolution Analytics

    Revolution R Enterprise

    R l ti R E t i (RRE)

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    30/33

    Revolution R Enterprise (RRE)The All-Inclusive Big Data Big Analytics Platform

    DistributedR

    DeployR DevelopR

    ScaleR

    ConnectR

    High-performance open source R p

    Data source connectivity to big-da

    Big-data advanced analytics

    Multi-platform environment suppo

    In-Hadoop and in-Teradata predic

    Visual Studio IDE option

    Secure, Scalable R Deployment Technical support, training and se

    24x7 support option

    Contact Revolution Analytics for more info: www.revolutionanalytics.

    http://www.revolutionanalytics.com/contact-ushttp://www.revolutionanalytics.com/contact-us
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    31/33

    Poll #2Which Revolution Analytics projects do you plan to use (or alreadySelect all that apply:

    1. Revolution R Open (free distribution)

    2. Revolution R Plus (paid subscription for support and indemnific

    3. Reproducible R Toolkit (checkpoint package)

    4. DeployR Open

    5. Rhadoop / ParallelR

  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    32/33

    Wrapping upRevolution R Open is available now from

    mran.revolutionanalytics.com/download

    Explore Revolution Analytics open-source projects at

    projects.revolutionanalytics.com

    Technical support and open-source assurance withRevolution R Plus

    www.revolutionanalytics.com/plus

    David SChief Co

    Revolutio@revodav

    david@re

    m

    http://mran.revolutionanalytics.com/download/http://projects.revolutionanalytics.com/http://www.revolutionanalytics.com/plusmailto:[email protected]:[email protected]://projects.revolutionanalytics.com/rrt/http://projects.revolutionanalytics.com/rhadoop/http://projects.revolutionanalytics.com/parallelr/http://projects.revolutionanalytics.com/deployr/mailto:[email protected]:[email protected]://www.revolutionanalytics.com/plushttp://projects.revolutionanalytics.com/http://mran.revolutionanalytics.com/download/http://mran.revolutionanalytics.com/download/
  • 8/10/2019 12nov14 Revolution r Open Webinar David Smith 141114124941 Conversion Gate02

    33/33

    Thank you.Next up:

    Batter Up! Advanced Sports Analytics with R and Storm

    December 11, 2014

    revolutionanalytics.com/webinars

    www.revolutionanalytics.com

    1.855.GET.REVO

    Twitter: @RevolutionR

    http://www.revolutionanalytics.com/webinars/batter-advanced-sports-analytics-r-and-stormhttp://www.revolutionanalytics.com/webinars/batter-advanced-sports-analytics-r-and-stormhttp://www.revolutionanalytics.com/webinars/batter-advanced-sports-analytics-r-and-stormhttp://www.revolutionanalytics.com/webinars/batter-advanced-sports-analytics-r-and-storm