R, Git, Github, and CI

download R, Git, Github, and CI

If you can't read please download the document

Transcript of R, Git, Github, and CI

R, Git, Github, and CITaiwan R User GroupWush Wu2014-09-20

DSC 2014

2014 is the first year of DSC(Data Science Conference) in Taiwan.

We (Taiwan R User Group) organizes the Tutorial Program of R in DSC.

There were more than 100 students joined us during DSC 2014.

The averaged rating is more than 4.2 (1 ~ 5).

Goal of Tutorial

Systematically introduce the analysis step with RBasic

Data Manipulation(Extract, Transform and Loading)

Analysis

Visualization

Based on the latest tools of R

Reproducibility of examples

Integration of materials

*Well designed exercises

About Me

PhD Candidate in NTU EE

Current research field:Online Advertisement

Large Scale Predictive Modeling

Organizer of Taiwan R User Group

Organizer of Tutorial Program in DSC 2014

Outline

Share the experience of organizing tutorial program with 16 people with:Git, my favorite tool of version control

Github, a platform of cooperation

Jenkins, a system of automation

I will show how to cooperate these tools with R package

Why R Package

There are many dependency for examples and exercises

R package is the recommended way to share your code

Wrap all materials in one R Package: DSC2014Tutorial so the students only need to download once.All slides are included.

Customized R API

All data

*Installation of depended packages

Solving issue of portability(Windows, Mac, and Ubuntu)

The package is easily managed by git and released on github

The structure of R package
Dependencies

DESCRIPTION

Package: DSC2014TutorialType: PackageTitle: Materials of Tutorial Program on DSC 2014Version: 1.2Date: 2014-08-03Author: Taiwan R User GroupMaintainer: Wush Wu Description: This package contains the required materials of R Tutorial DSC2014License: GPL (>= 3)Depends: R (>= 3.1.0)Imports: tools, ...

The structure of R package
Data

data

data(salary, package = 'DSC2014Tutorial')

The structure of R package
cross-platform

configure.ac / configure

The structure of R package
slides and external source

system.file('Basic', package = 'DSC2014Tutorial')

Git, Version Control

Some speakers are new to git

We used the following feature:Self version control: add, commit

Repository: remote, push, pull, and merge

Cooperation: submodul

Git plays the fundamental role in our workflow

Why Git?

Speed is king

Local commits rock

Github

My favorite

Github

Most popular platform for managing git repository

Provide many convenient featuresAccount of Organization

Designed for cooperation

Simple integration with many popular CI tools

Static website (Sufficient for R Repository)

Release R Package on Github

R is released as:a git repository

a R repository

Github and R Repository

How to establish a R repository on github:

Create a new git repository named R

Add the content of R repository into git repository in branch gh-pages

Push and wait

The R Repository is located at http://.github.io/R

The user could install the binary of DSC2014Tutorial directly via

install.packages(DSC2014Tutorial, repos = "http://TaiwanRUserGroup.github.io/R")

Cooperation

I cannot build all slides of tutorialThere are 7 slides built from different groups of speakers

Each slides should be managed by its authorEach slides is a standalone git repository

No branching here because not all speakers are familiear with git

Use gitsubmodule to embed these slides into R Package

We need modern work flow to control the quality

Workflow 1

Each speakers creates the slides and initialize the git repository

Speakers commit their changes to git repository

Open the pull request

Slide review and test on different platform

Merge changes to DSC2014Tutorial

Commits

Pull Requests

Review

Merge

Slide Review

Each speakers review the slides of each others

The comment are posted to Issue of the github pages

The speaker should resolve the posted issue

Issues

Challenge

After the first rehearsal on Taiwan R User Group, we notice a serious encoding issueDefault chinese encoding is different

Challenge

We could resolve the specific issue

The slides are evolving, some bugs might occur

We need to test the slides, but there are 7 slides and we want to test them on Windows, ubuntu and mac*

Why CI

CI automates the following thingsTesting

Integration

Deployment

CI makes me a better life

CI also introduces some problems. Let's discuss it later.

Test R Package

R CMD check --no-codoc --no-manual --no-vignettes no-build-vignettes

Deploy R Package

git push

Commit to R Repository

tools::write_PACKAGES( type = c("source", "mac.binary", "win.binary") )

R and CI
travis-ci.org

Existed work for R and Travis-ci

https://github.com/craigcitro/r-travis/wiki

travis.yml

language: cscript: ./travis-tool.sh run_testsafter_failure: - ./travis-tool.sh dump_logsbefore_install: - curl -OL http://raw.github.com/craigcitro/r-travis/master/scripts/travis-tool.sh - chmod 755 ./travis-tool.sh - ./travis-tool.sh bootstrap - ./travis-tool.sh r_binary_install XML Rcpp knitr brew RUnit inline highlight formatR highr markdown rglinstall: - ./travis-tool.sh install_deps - ./travis-tool.sh install_github hadley/testthatnotifications: email: on_success: change on_failure: changeenv:

R and CI
jenkins

Setup Jenkins

Github Plugin http://sanketdangi.com/post/62740311628/integrate-jenkins-github-trigger-build-process

Github Pull Request Builder http://www.kabisa.nl/building-github-pull-requests-with-jenkins/

Firewall (open to 192.30.252.0/22)

Auto Testing

Result

Discussion

No Error v.s. No Warnings

Existed Problems:Memory issue

Unknown Bugs

Unclear Message

Summary

Tutorial and R Package

Git and R Package

Github and R Package

CI and R Package

Q&A

Thanks for your listening